fil/docs/reference/environment-variables.md

# Environment Variables Reference

Configuration precedence in Kreuzberg follows this order (highest to lowest):

1. **Environment Variables** - Highest priority, overrides all other sources
2. **Configuration Files** - TOML, YAML, or JSON config files
3. **Defaults** - Built-in sensible defaults

This document covers all KREUZBERG\_\* environment variables for version 4.3.8.

## When to Use Environment Variables

Environment variables are ideal for:

- **Container/Cloud Deployments**: Docker, Kubernetes, serverless environments where config files are impractical
- **CI/CD Pipelines**: Override settings per environment (dev, staging, production)
- **Simple Overrides**: Changing one or two settings without managing a config file
- **Secrets Management**: Using secret management systems that inject values as env vars

For complex configurations with many settings, configuration files are recommended:

```toml title="Example Configuration File"
# kreuzberg.toml is cleaner for multiple settings
[ocr]
language = "eng"
backend = "tesseract"

[chunking]
max_chars = 2000
max_overlap = 300
```

## API Server Configuration

These variables control the Kreuzberg server's network behavior and request handling.

### KREUZBERG_HOST

**Type**: `String`
**Default**: `127.0.0.1`
**Valid Values**: Any IPv4 or IPv6 address, or hostname

The server bind address. Use `0.0.0.0` to listen on all interfaces.

```bash title="Server Bind Address Examples"
# Listen only on localhost (default)
export KREUZBERG_HOST=127.0.0.1

# Listen on all interfaces (Docker, cloud deployments)
export KREUZBERG_HOST=0.0.0.0

# Listen on specific interface
export KREUZBERG_HOST=192.168.1.100
```

### KREUZBERG_PORT

**Type**: `u16` (1-65535)
**Default**: `8000`

The server port number.

```bash title="Server Port Examples"
export KREUZBERG_PORT=3000
export KREUZBERG_PORT=8080
```

**Error**: Port must be a valid u16 number:

```text
KREUZBERG_PORT must be a valid u16 number, got 'invalid': invalid digit found in string
```

### KREUZBERG_CORS_ORIGINS

**Type**: `String` (comma-separated list)
**Default**: Empty (allows all origins)

Whitelist of allowed CORS origins. When empty, the server accepts requests from any origin.

```bash title="CORS Origins Configuration"
# Allow all origins (default)
# unset KREUZBERG_CORS_ORIGINS

# Allow specific origins
export KREUZBERG_CORS_ORIGINS="https://api.example.com, https://app.example.com"

# Single origin
export KREUZBERG_CORS_ORIGINS="https://trusted.com"
```

**Security Warning**: Be explicit with CORS origins in production. Allowing all origins (`*`) means any website can call your API on behalf of users. In Kreuzberg, an empty list allows all origins - be intentional about this choice.

```bash title="CORS Security Best Practices"
# Production: Restrict to known origins
export KREUZBERG_CORS_ORIGINS="https://app.mycompany.com, https://admin.mycompany.com"

# Development: Can use wildcard, but understand the security implications
# Don't use wildcard in production unless absolutely necessary
```

### KREUZBERG_MAX_REQUEST_BODY_BYTES

**Type**: `usize` (bytes)
**Default**: `104857600` (100 MB)

Maximum size of HTTP request bodies. Prevents oversized requests from consuming server resources.

```bash title="Max Request Body Size Examples"
# 50 MB
export KREUZBERG_MAX_REQUEST_BODY_BYTES=52428800

# 200 MB
export KREUZBERG_MAX_REQUEST_BODY_BYTES=209715200

# 500 MB
export KREUZBERG_MAX_REQUEST_BODY_BYTES=524288000
```

**Note**: Both `KREUZBERG_MAX_REQUEST_BODY_BYTES` and `KREUZBERG_MAX_MULTIPART_FIELD_BYTES` control upload limits. Adjust both for consistent behavior.

### KREUZBERG_MAX_MULTIPART_FIELD_BYTES

**Type**: `usize` (bytes)
**Default**: `104857600` (100 MB)

Maximum size of individual multipart form fields. Controls the size of file uploads in multipart requests.

```bash title="Max Multipart Field Size Examples"
# 100 MB (default)
export KREUZBERG_MAX_MULTIPART_FIELD_BYTES=104857600

# 500 MB for large document processing
export KREUZBERG_MAX_MULTIPART_FIELD_BYTES=524288000

# 1 GB for extreme cases
export KREUZBERG_MAX_MULTIPART_FIELD_BYTES=1073741824
```

## Extraction Configuration

These variables control document extraction behavior, including OCR, text chunking, and caching.

### KREUZBERG_OCR_LANGUAGE

**Type**: `String` (ISO 639-1 or 639-3 language code)
**Default**: `eng` (English)

OCR language for scanned documents. Must be a valid language code recognized by the OCR backend.

```bash title="OCR Language Configuration"
# English (default)
export KREUZBERG_OCR_LANGUAGE=eng

# German
export KREUZBERG_OCR_LANGUAGE=deu

# French
export KREUZBERG_OCR_LANGUAGE=fra

# Spanish
export KREUZBERG_OCR_LANGUAGE=spa

# Chinese (Simplified)
export KREUZBERG_OCR_LANGUAGE=chi_sim

# Japanese
export KREUZBERG_OCR_LANGUAGE=jpn
```

**Supported Codes**: Language codes are backend-agnostic and automatically mapped to the appropriate format for each backend:

- **Tesseract codes** (ISO 639-3): `eng`, `deu`, `fra`, `spa`, `ita`, `por`, `rus`, `chi_sim`, `chi_tra`, `jpn`, `kor`
- **PaddleOCR codes**: `en`, `ch`, `french`, `german`, `korean`, `thai`, `greek`, `cyrillic`, `latin`, `arabic`, `devanagari`, `tamil`, `telugu`
- **ISO 639-1 codes**: `en`, `de`, `fr`, `es`, `ja`, `ko`, `zh`, `ru`, `ar`, `th`, `el`

All code formats are accepted regardless of backend — Kreuzberg automatically maps between them.

### KREUZBERG_OCR_BACKEND

**Type**: `String`
**Default**: `tesseract`
**Valid Values**: `tesseract`, `easyocr`, `paddleocr`

OCR engine to use for text extraction from images and scanned documents.

```bash title="OCR Backend Selection"
# Tesseract (open source, good for English)
export KREUZBERG_OCR_BACKEND=tesseract

# EasyOCR (better multilingual support, slower)
export KREUZBERG_OCR_BACKEND=easyocr

# PaddleOCR (fast, good accuracy across languages)
export KREUZBERG_OCR_BACKEND=paddleocr
```

**Performance Notes**:

- **tesseract**: Fastest, best for English and Latin scripts
- **easyocr**: Slower, excellent multilingual support
- **paddleocr**: Fast with good accuracy for many languages

### KREUZBERG_CHUNKING_MAX_CHARS

**Type**: `usize` (positive integer)
**Default**: `1000` (characters)

Maximum number of characters per text chunk. Smaller chunks are useful for LLM context windows.

```bash title="Chunk Size Configuration"
# Small chunks for token-constrained LLMs
export KREUZBERG_CHUNKING_MAX_CHARS=512

# Default: balanced for most use cases
export KREUZBERG_CHUNKING_MAX_CHARS=1000

# Larger chunks for fewer splits
export KREUZBERG_CHUNKING_MAX_CHARS=2000

# Very large chunks for comprehensive context
export KREUZBERG_CHUNKING_MAX_CHARS=4000
```

**Validation**: Must be greater than 0. Must be greater than `KREUZBERG_CHUNKING_MAX_OVERLAP`.

### KREUZBERG_CHUNKING_MAX_OVERLAP

**Type**: `usize` (non-negative integer)
**Default**: `200` (characters)

Character overlap between consecutive chunks. Maintains context across chunk boundaries.

```bash title="Chunk Overlap Configuration"
# No overlap (creates discontinuities)
export KREUZBERG_CHUNKING_MAX_OVERLAP=0

# Default: 20% overlap with 1000-char chunks
export KREUZBERG_CHUNKING_MAX_OVERLAP=200

# More overlap: 30% for better context continuity
export KREUZBERG_CHUNKING_MAX_OVERLAP=300

# High overlap for sensitive documents
export KREUZBERG_CHUNKING_MAX_OVERLAP=500
```

**Validation**: Must be less than `KREUZBERG_CHUNKING_MAX_CHARS`.

**Example Error**:

```text
Chunking overlap (500) cannot be greater than or equal to max_chars (1000)
```

### KREUZBERG_CACHE_ENABLED

**Type**: `Boolean` (`true` or `false`, case-insensitive)
**Default**: `true`

Enable or disable extraction result caching. Cache stores results to avoid reprocessing identical documents.

```bash title="Cache Enable/Disable"
# Enable cache (default, recommended for production)
export KREUZBERG_CACHE_ENABLED=true

# Disable cache (development, testing, or when cache is problematic)
export KREUZBERG_CACHE_ENABLED=false

# Case insensitive
export KREUZBERG_CACHE_ENABLED=TRUE
export KREUZBERG_CACHE_ENABLED=False
```

### KREUZBERG_OUTPUT_FORMAT

**Type**: `String`
**Default**: `plain`
**Valid Values**: `plain`, `markdown`, `djot`, `html`

Controls the text content format of extraction results. Determines how extracted text is formatted in the result output.

```bash title="Output Format Options"
# Plain text content only (default)
export KREUZBERG_OUTPUT_FORMAT=plain

# Markdown formatted output
export KREUZBERG_OUTPUT_FORMAT=markdown

# Djot markup format
export KREUZBERG_OUTPUT_FORMAT=djot

# HTML formatted output
export KREUZBERG_OUTPUT_FORMAT=html
```

**Use Cases**:

| Format     | Use Case                                                        |
| ---------- | --------------------------------------------------------------- |
| `plain`    | Raw extracted text without formatting                           |
| `markdown` | Structured text with headings, lists, emphasis (RAG, LLM input) |
| `djot`     | Lightweight markup, alternative to Markdown                     |
| `html`     | Rich formatted output for web display                           |

**Example:**

```bash title="Extract with markdown formatting"
export KREUZBERG_OUTPUT_FORMAT=markdown
kreuzberg
```

### KREUZBERG_TOKEN_REDUCTION_MODE

**Type**: `String`
**Default**: `off`
**Valid Values**: `off`, `light`, `moderate`, `aggressive`, `maximum`

Token reduction aggressiveness for compressing extracted text while preserving meaning. Useful when working with token-limited LLMs.

```bash title="Token Reduction Mode Options"
# No reduction (keep all text as-is)
export KREUZBERG_TOKEN_REDUCTION_MODE=off

# Light reduction: Remove common stopwords, minimal impact
export KREUZBERG_TOKEN_REDUCTION_MODE=light

# Moderate reduction: Balance between compression and meaning preservation
export KREUZBERG_TOKEN_REDUCTION_MODE=moderate

# Aggressive reduction: Significant compression, some detail loss
export KREUZBERG_TOKEN_REDUCTION_MODE=aggressive

# Maximum reduction: Extreme compression for token-constrained scenarios
export KREUZBERG_TOKEN_REDUCTION_MODE=maximum
```

**Impact on Tokens**:

| Mode         | Typical Reduction | Use Case                                    |
| ------------ | ----------------- | ------------------------------------------- |
| `off`        | 0%                | Full preservation, no compression           |
| `light`      | 10-15%            | Minimal impact, clean up obvious redundancy |
| `moderate`   | 25-35%            | Balanced approach for most scenarios        |
| `aggressive` | 40-50%            | Significant compression, still readable     |
| `maximum`    | 50-70%            | Extreme compression, lose some detail       |

## Runtime Configuration

Control cache location, debug output, and runtime behavior.

### KREUZBERG_CACHE_DIR

**Type**: `String` (file system path)
**Default**: Platform-specific global cache directory

Override the default cache directory for storing extraction cache, models, and intermediate files. When unset, Kreuzberg uses a platform-appropriate global cache:

- **Linux**: `~/.cache/kreuzberg/` (or `$XDG_CACHE_HOME/kreuzberg/`)
- **macOS**: `~/Library/Caches/kreuzberg/`
- **Windows**: `%LOCALAPPDATA%/kreuzberg/`

If the platform cache directory cannot be determined, Kreuzberg falls back to `~/.cache/kreuzberg/`, then `.kreuzberg/` in the current working directory as a last resort.

```bash title="Cache Directory Configuration"
# Default: uses platform-specific global cache (recommended)
# unset KREUZBERG_CACHE_DIR

# Store cache in specific location
export KREUZBERG_CACHE_DIR=/var/cache/kreuzberg

# Docker: Use volume mount
export KREUZBERG_CACHE_DIR=/data/kreuzberg-cache

# Development: Quick local cleanup
export KREUZBERG_CACHE_DIR=/tmp/kreuzberg-cache
```

**Directory Structure**: Kreuzberg creates subdirectories for different cache types:

```text
$KREUZBERG_CACHE_DIR/
  ocr/                    # OCR result cache
  embeddings/             # Chunk embedding cache
  extractions/            # Full extraction cache
```

### KREUZBERG_CI_DEBUG

**Type**: `Boolean` (presence check: set to any value to enable)
**Default**: Disabled (unset)

Enable detailed debug logging for CI environments. Outputs step-by-step timing and parameter information for OCR operations.

```bash title="Enable CI Debug Logging"
# Enable CI debug output
export KREUZBERG_CI_DEBUG=1
export KREUZBERG_CI_DEBUG=true
export KREUZBERG_CI_DEBUG=yes

# Output example:
# [kreuzberg::ocr] perform_ocr:start bytes=1024000 language=eng output=text use_cache=true
# [kreuzberg::ocr] perform_ocr:end duration_ms=2534
```

**Use Cases**:

- Debugging slow OCR operations
- Tracing cache hits/misses
- Performance profiling in CI pipelines
- Understanding extraction pipeline behavior

### KREUZBERG_DEBUG_OCR

**Type**: `Boolean` (presence check: set to any value to enable)
**Default**: Disabled (unset)

Enable OCR-specific debug output. Outputs diagnostic information about OCR decisions, fallbacks, and text coverage metrics.

```bash title="Enable OCR Debug Logging"
# Enable OCR debug logging
export KREUZBERG_DEBUG_OCR=1

# Output example:
# [kreuzberg::pdf::ocr] fallback=true non_whitespace=8543 alnum=7234 meaningful_words=312
# [kreuzberg::pdf::ocr] avg_non_whitespace=45.2 avg_alnum=38.1 alnum_ratio=0.847
```

**Diagnostic Information**:

- Whether OCR fallback was triggered
- Character counts (whitespace, alphanumeric)
- Word counts and coverage ratios
- Coverage thresholds and decisions

## Memory & Performance

Configure caching for string encoding operations to optimize performance.

### KREUZBERG_ENCODING_CACHE_MAX_ENTRIES

**Type**: `usize` (positive integer)
**Default**: `10000`

Maximum number of strings cached in the encoding cache. Each entry consumes memory proportional to string length.

```bash title="Encoding Cache Entry Limit"
# Default: reasonable for most applications
export KREUZBERG_ENCODING_CACHE_MAX_ENTRIES=10000

# Higher for very large batches
export KREUZBERG_ENCODING_CACHE_MAX_ENTRIES=50000

# Lower to reduce memory usage
export KREUZBERG_ENCODING_CACHE_MAX_ENTRIES=1000
```

### KREUZBERG_ENCODING_CACHE_MAX_BYTES

**Type**: `usize` (bytes)
**Default**: `104857600` (100 MB)

Maximum total size of cached strings in bytes. Once exceeded, least-used entries are evicted.

```bash title="Encoding Cache Size Limit"
# Default: 100 MB
export KREUZBERG_ENCODING_CACHE_MAX_BYTES=104857600

# Larger cache for high-throughput scenarios
export KREUZBERG_ENCODING_CACHE_MAX_BYTES=524288000  # 500 MB

# Smaller cache for memory-constrained environments
export KREUZBERG_ENCODING_CACHE_MAX_BYTES=10485760   # 10 MB
```

## LLM Integration

Configure LLM-powered features such as structured extraction, vision-based OCR, and provider-hosted embeddings.

### KREUZBERG_LLM_MODEL

**Type**: `String`
**Default**: None (must be set explicitly or via config)

Default LLM model for structured extraction. Uses [liter-llm](https://github.com/kreuzberg-dev/liter-llm) model format (`provider/model-name`).

```bash title="LLM Model Configuration"
# OpenAI
export KREUZBERG_LLM_MODEL=openai/gpt-4o-mini

# Anthropic
export KREUZBERG_LLM_MODEL=anthropic/claude-sonnet-4-20250514

# Local provider
export KREUZBERG_LLM_MODEL=ollama/llama3
```

### KREUZBERG_LLM_API_KEY

**Type**: `String`
**Default**: None

API key for the structured extraction LLM provider. When not set, liter-llm falls back to provider-standard environment variables (for example, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`).

```bash title="LLM API Key Configuration"
export KREUZBERG_LLM_API_KEY=sk-...
```

**Security Warning**: Prefer using provider-standard environment variables or a secrets manager over setting this directly. This variable is provided for cases where multiple providers are used and explicit key routing is needed.

### KREUZBERG_LLM_BASE_URL

**Type**: `String`
**Default**: None (uses provider default)

Custom base URL for the structured extraction LLM provider. Useful for self-hosted models, proxies, or alternative API-compatible endpoints.

```bash title="LLM Base URL Configuration"
# Custom OpenAI-compatible endpoint
export KREUZBERG_LLM_BASE_URL=https://api.example.com

# Local Ollama instance
export KREUZBERG_LLM_BASE_URL=http://localhost:11434
```

### KREUZBERG_VLM_OCR_MODEL

**Type**: `String`
**Default**: None (must be set explicitly or via config)

VLM (Vision Language Model) model for vision-based OCR. When configured, Kreuzberg can use a vision model as an OCR backend, sending document images directly to the VLM for text extraction.

```bash title="VLM OCR Model Configuration"
# OpenAI GPT-4o for vision OCR
export KREUZBERG_VLM_OCR_MODEL=openai/gpt-4o

# Anthropic Claude for vision OCR
export KREUZBERG_VLM_OCR_MODEL=anthropic/claude-sonnet-4-20250514
```

### KREUZBERG_VLM_EMBEDDING_MODEL

**Type**: `String`
**Default**: None (must be set explicitly or via config)

LLM model for provider-hosted embeddings. Instead of running local ONNX embedding models, Kreuzberg can delegate embedding generation to a cloud provider's embedding API.

```bash title="VLM Embedding Model Configuration"
# OpenAI embeddings
export KREUZBERG_VLM_EMBEDDING_MODEL=openai/text-embedding-3-small

# Cohere embeddings
export KREUZBERG_VLM_EMBEDDING_MODEL=cohere/embed-english-v3.0
```

**Note**: When `api_key` is not set in config, liter-llm falls back to provider-standard environment variables (for example, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`).

| Variable                        | Description                                        | Example                         |
| ------------------------------- | -------------------------------------------------- | ------------------------------- |
| `KREUZBERG_LLM_MODEL`           | Default LLM model for structured extraction        | `openai/gpt-4o-mini`            |
| `KREUZBERG_LLM_API_KEY`         | API key for structured extraction LLM provider     | `sk-...`                        |
| `KREUZBERG_LLM_BASE_URL`        | Custom base URL for structured extraction provider | `https://api.example.com`       |
| `KREUZBERG_VLM_OCR_MODEL`       | VLM model for vision-based OCR                     | `openai/gpt-4o`                 |
| `KREUZBERG_VLM_EMBEDDING_MODEL` | LLM model for provider-hosted embeddings           | `openai/text-embedding-3-small` |

---

## Testing Variables

Variables for development, testing, and quality assurance.

### KREUZBERG_RUN_FULL_OCR

**Type**: `Boolean` (presence check: set to any value to enable)
**Default**: Disabled (skips expensive tests)
**Status**: Testing only

Enable expensive OCR quality tests. These tests perform full OCR on large documents and are slow (can take minutes).

```bash title="Enable Full OCR Tests"
# Skip expensive OCR tests (default, fast test runs)
# unset KREUZBERG_RUN_FULL_OCR

# Run full OCR quality tests
export KREUZBERG_RUN_FULL_OCR=1

# In test output:
# test test_ocr_quality_multi_page_consistency ... SKIPPED
# Skipping test_ocr_quality_multi_page_consistency: set KREUZBERG_RUN_FULL_OCR=1 to enable
```

**Warning**:

- These tests can take 10+ minutes
- Require OCR backends to be installed and working
- Produce large temporary files
- Use only in CI/CD for comprehensive validation

## Docker Compose Examples

### Basic Configuration

```yaml title="Docker Compose - Basic Setup"
version: "3.8"
services:
  kreuzberg:
    image: kreuzberg:latest
    ports:
      - "3000:3000"
    environment:
      KREUZBERG_HOST: "0.0.0.0"
      KREUZBERG_PORT: "3000"
      KREUZBERG_OCR_LANGUAGE: "eng"
      KREUZBERG_CACHE_ENABLED: "true"
```

### Production Configuration

```yaml title="Docker Compose - Production Setup"
version: "3.8"
services:
  kreuzberg:
    image: kreuzberg:latest
    ports:
      - "8000:8000"
    volumes:
      - kreuzberg_cache:/data/cache
    environment:
      KREUZBERG_HOST: "0.0.0.0"
      KREUZBERG_PORT: "8000"
      KREUZBERG_CORS_ORIGINS: "https://app.example.com, https://admin.example.com"
      KREUZBERG_MAX_REQUEST_BODY_BYTES: "209715200" # 200 MB
      KREUZBERG_MAX_MULTIPART_FIELD_BYTES: "209715200"
      KREUZBERG_CACHE_DIR: "/data/cache"
      KREUZBERG_OCR_LANGUAGE: "eng"
      KREUZBERG_OCR_BACKEND: "tesseract"
      KREUZBERG_CHUNKING_MAX_CHARS: "2000"
      KREUZBERG_CHUNKING_MAX_OVERLAP: "300"
      KREUZBERG_TOKEN_REDUCTION_MODE: "moderate"

volumes:
  kreuzberg_cache:
    driver: local
```

### Multilingual Configuration

```yaml title="Docker Compose - Multilingual Setup"
version: "3.8"
services:
  kreuzberg:
    image: kreuzberg:latest
    ports:
      - "8000:8000"
    environment:
      KREUZBERG_HOST: "0.0.0.0"
      KREUZBERG_PORT: "8000"
      KREUZBERG_OCR_BACKEND: "easyocr" # Better multilingual support
      KREUZBERG_OCR_LANGUAGE: "fra" # French
      KREUZBERG_CACHE_ENABLED: "true"
```

### Development Configuration

```yaml title="Docker Compose - Development Setup"
version: "3.8"
services:
  kreuzberg:
    image: kreuzberg:latest
    ports:
      - "8000:8000"
    environment:
      KREUZBERG_HOST: "127.0.0.1"
      KREUZBERG_PORT: "8000"
      KREUZBERG_CACHE_ENABLED: "false" # Disable for fresh testing
      KREUZBERG_CI_DEBUG: "1" # Enable debug output
      KREUZBERG_DEBUG_OCR: "1"
      KREUZBERG_CACHE_DIR: "/tmp/kreuzberg"
```

## Environment Variable Loading Order

Kreuzberg applies environment variables in this order:

1. Load configuration file (TOML/YAML/JSON) if specified
2. Parse environment variables using `apply_env_overrides()`
3. Validate all settings

This ensures environment variables always win over file configuration:

```rust title="Rust - Applying Environment Overrides"
let mut config = ExtractionConfig::from_file("kreuzberg.toml")?;
config.apply_env_overrides()?;  // Overrides file values
```

## Common Patterns

### Using with Config Files

Combine files with environment overrides for flexibility:

```bash title="Combining Config Files with Env Overrides"
# Load base config from file
# Override specific values for this deployment
export KREUZBERG_OCR_LANGUAGE=deu
export KREUZBERG_CACHE_DIR=/mnt/cache
kreuzberg --config kreuzberg.toml
```

### Shell Script Initialization

```bash title="Environment-Based Shell Script"
#!/bin/bash
# Load deployment-specific settings

if [ "$ENVIRONMENT" = "production" ]; then
  export KREUZBERG_HOST="0.0.0.0"
  export KREUZBERG_CORS_ORIGINS="https://app.example.com"
  export KREUZBERG_CACHE_ENABLED="true"
  export KREUZBERG_MAX_REQUEST_BODY_BYTES=$((200 * 1048576))
elif [ "$ENVIRONMENT" = "development" ]; then
  export KREUZBERG_HOST="127.0.0.1"
  export KREUZBERG_CACHE_ENABLED="false"
  export KREUZBERG_CI_DEBUG="1"
fi

kreuzberg
```

### Kubernetes ConfigMap

```yaml title="Kubernetes ConfigMap and Pod Configuration"
apiVersion: v1
kind: ConfigMap
metadata:
  name: kreuzberg-config
data:
  KREUZBERG_HOST: "0.0.0.0"
  KREUZBERG_PORT: "8000"
  KREUZBERG_CORS_ORIGINS: "https://api.example.com"
  KREUZBERG_CACHE_DIR: "/data/cache"
  KREUZBERG_OCR_BACKEND: "tesseract"
  KREUZBERG_TOKEN_REDUCTION_MODE: "moderate"
---
apiVersion: v1
kind: Pod
metadata:
  name: kreuzberg-server
spec:
  containers:
    - name: kreuzberg
      image: kreuzberg:latest
      ports:
        - containerPort: 8000
      envFrom:
        - configMapRef:
            name: kreuzberg-config
      volumeMounts:
        - name: cache
          mountPath: /data/cache
  volumes:
    - name: cache
      persistentVolumeClaim:
        claimName: kreuzberg-cache-pvc
```

## ONNX Runtime Configuration

### ORT_DYLIB_PATH

**Type**: `String`
**Default**: Not set (bundled CPU ONNX Runtime is used)

Path to a custom ONNX Runtime shared library. Set this to use a GPU-enabled ONNX Runtime instead of the bundled CPU-only version.

Required for GPU acceleration (`cuda`, `tensorrt`) with PaddleOCR, layout detection, embeddings, and document orientation detection.

```bash title="GPU Acceleration Setup"
# Linux — using ONNX Runtime GPU release
export ORT_DYLIB_PATH=/usr/local/lib/libonnxruntime.so

# Linux — using pip-installed onnxruntime-gpu
export ORT_DYLIB_PATH=$(python -c "import onnxruntime; print(onnxruntime.__path__[0])")/capi/libonnxruntime.so

# macOS — using Homebrew
export ORT_DYLIB_PATH=/opt/homebrew/lib/libonnxruntime.dylib

# Windows
set ORT_DYLIB_PATH=C:\path\to\onnxruntime.dll
```

When not set, Kreuzberg auto-discovers system-installed ONNX Runtime on common paths. If no system library is found, the bundled CPU-only version is used.

## See Also

- [Configuration Guide](./configuration.md) - Detailed configuration file format and options
- [File Size Limits](./file-size-limits.md) - Upload and processing limits
- [Types Reference](./types.md) - API type definitions and structures