# Environment Variables Reference Configuration precedence in Kreuzberg follows this order (highest to lowest): 1. **Environment Variables** - Highest priority, overrides all other sources 2. **Configuration Files** - TOML, YAML, or JSON config files 3. **Defaults** - Built-in sensible defaults This document covers all KREUZBERG\_\* environment variables for version 4.3.8. ## When to Use Environment Variables Environment variables are ideal for: - **Container/Cloud Deployments**: Docker, Kubernetes, serverless environments where config files are impractical - **CI/CD Pipelines**: Override settings per environment (dev, staging, production) - **Simple Overrides**: Changing one or two settings without managing a config file - **Secrets Management**: Using secret management systems that inject values as env vars For complex configurations with many settings, configuration files are recommended: ```toml title="Example Configuration File" # kreuzberg.toml is cleaner for multiple settings [ocr] language = "eng" backend = "tesseract" [chunking] max_chars = 2000 max_overlap = 300 ``` ## API Server Configuration These variables control the Kreuzberg server's network behavior and request handling. ### KREUZBERG_HOST **Type**: `String` **Default**: `127.0.0.1` **Valid Values**: Any IPv4 or IPv6 address, or hostname The server bind address. Use `0.0.0.0` to listen on all interfaces. ```bash title="Server Bind Address Examples" # Listen only on localhost (default) export KREUZBERG_HOST=127.0.0.1 # Listen on all interfaces (Docker, cloud deployments) export KREUZBERG_HOST=0.0.0.0 # Listen on specific interface export KREUZBERG_HOST=192.168.1.100 ``` ### KREUZBERG_PORT **Type**: `u16` (1-65535) **Default**: `8000` The server port number. ```bash title="Server Port Examples" export KREUZBERG_PORT=3000 export KREUZBERG_PORT=8080 ``` **Error**: Port must be a valid u16 number: ```text KREUZBERG_PORT must be a valid u16 number, got 'invalid': invalid digit found in string ``` ### KREUZBERG_CORS_ORIGINS **Type**: `String` (comma-separated list) **Default**: Empty (allows all origins) Whitelist of allowed CORS origins. When empty, the server accepts requests from any origin. ```bash title="CORS Origins Configuration" # Allow all origins (default) # unset KREUZBERG_CORS_ORIGINS # Allow specific origins export KREUZBERG_CORS_ORIGINS="https://api.example.com, https://app.example.com" # Single origin export KREUZBERG_CORS_ORIGINS="https://trusted.com" ``` **Security Warning**: Be explicit with CORS origins in production. Allowing all origins (`*`) means any website can call your API on behalf of users. In Kreuzberg, an empty list allows all origins - be intentional about this choice. ```bash title="CORS Security Best Practices" # Production: Restrict to known origins export KREUZBERG_CORS_ORIGINS="https://app.mycompany.com, https://admin.mycompany.com" # Development: Can use wildcard, but understand the security implications # Don't use wildcard in production unless absolutely necessary ``` ### KREUZBERG_MAX_REQUEST_BODY_BYTES **Type**: `usize` (bytes) **Default**: `104857600` (100 MB) Maximum size of HTTP request bodies. Prevents oversized requests from consuming server resources. ```bash title="Max Request Body Size Examples" # 50 MB export KREUZBERG_MAX_REQUEST_BODY_BYTES=52428800 # 200 MB export KREUZBERG_MAX_REQUEST_BODY_BYTES=209715200 # 500 MB export KREUZBERG_MAX_REQUEST_BODY_BYTES=524288000 ``` **Note**: Both `KREUZBERG_MAX_REQUEST_BODY_BYTES` and `KREUZBERG_MAX_MULTIPART_FIELD_BYTES` control upload limits. Adjust both for consistent behavior. ### KREUZBERG_MAX_MULTIPART_FIELD_BYTES **Type**: `usize` (bytes) **Default**: `104857600` (100 MB) Maximum size of individual multipart form fields. Controls the size of file uploads in multipart requests. ```bash title="Max Multipart Field Size Examples" # 100 MB (default) export KREUZBERG_MAX_MULTIPART_FIELD_BYTES=104857600 # 500 MB for large document processing export KREUZBERG_MAX_MULTIPART_FIELD_BYTES=524288000 # 1 GB for extreme cases export KREUZBERG_MAX_MULTIPART_FIELD_BYTES=1073741824 ``` ## Extraction Configuration These variables control document extraction behavior, including OCR, text chunking, and caching. ### KREUZBERG_OCR_LANGUAGE **Type**: `String` (ISO 639-1 or 639-3 language code) **Default**: `eng` (English) OCR language for scanned documents. Must be a valid language code recognized by the OCR backend. ```bash title="OCR Language Configuration" # English (default) export KREUZBERG_OCR_LANGUAGE=eng # German export KREUZBERG_OCR_LANGUAGE=deu # French export KREUZBERG_OCR_LANGUAGE=fra # Spanish export KREUZBERG_OCR_LANGUAGE=spa # Chinese (Simplified) export KREUZBERG_OCR_LANGUAGE=chi_sim # Japanese export KREUZBERG_OCR_LANGUAGE=jpn ``` **Supported Codes**: Language codes are backend-agnostic and automatically mapped to the appropriate format for each backend: - **Tesseract codes** (ISO 639-3): `eng`, `deu`, `fra`, `spa`, `ita`, `por`, `rus`, `chi_sim`, `chi_tra`, `jpn`, `kor` - **PaddleOCR codes**: `en`, `ch`, `french`, `german`, `korean`, `thai`, `greek`, `cyrillic`, `latin`, `arabic`, `devanagari`, `tamil`, `telugu` - **ISO 639-1 codes**: `en`, `de`, `fr`, `es`, `ja`, `ko`, `zh`, `ru`, `ar`, `th`, `el` All code formats are accepted regardless of backend — Kreuzberg automatically maps between them. ### KREUZBERG_OCR_BACKEND **Type**: `String` **Default**: `tesseract` **Valid Values**: `tesseract`, `easyocr`, `paddleocr` OCR engine to use for text extraction from images and scanned documents. ```bash title="OCR Backend Selection" # Tesseract (open source, good for English) export KREUZBERG_OCR_BACKEND=tesseract # EasyOCR (better multilingual support, slower) export KREUZBERG_OCR_BACKEND=easyocr # PaddleOCR (fast, good accuracy across languages) export KREUZBERG_OCR_BACKEND=paddleocr ``` **Performance Notes**: - **tesseract**: Fastest, best for English and Latin scripts - **easyocr**: Slower, excellent multilingual support - **paddleocr**: Fast with good accuracy for many languages ### KREUZBERG_CHUNKING_MAX_CHARS **Type**: `usize` (positive integer) **Default**: `1000` (characters) Maximum number of characters per text chunk. Smaller chunks are useful for LLM context windows. ```bash title="Chunk Size Configuration" # Small chunks for token-constrained LLMs export KREUZBERG_CHUNKING_MAX_CHARS=512 # Default: balanced for most use cases export KREUZBERG_CHUNKING_MAX_CHARS=1000 # Larger chunks for fewer splits export KREUZBERG_CHUNKING_MAX_CHARS=2000 # Very large chunks for comprehensive context export KREUZBERG_CHUNKING_MAX_CHARS=4000 ``` **Validation**: Must be greater than 0. Must be greater than `KREUZBERG_CHUNKING_MAX_OVERLAP`. ### KREUZBERG_CHUNKING_MAX_OVERLAP **Type**: `usize` (non-negative integer) **Default**: `200` (characters) Character overlap between consecutive chunks. Maintains context across chunk boundaries. ```bash title="Chunk Overlap Configuration" # No overlap (creates discontinuities) export KREUZBERG_CHUNKING_MAX_OVERLAP=0 # Default: 20% overlap with 1000-char chunks export KREUZBERG_CHUNKING_MAX_OVERLAP=200 # More overlap: 30% for better context continuity export KREUZBERG_CHUNKING_MAX_OVERLAP=300 # High overlap for sensitive documents export KREUZBERG_CHUNKING_MAX_OVERLAP=500 ``` **Validation**: Must be less than `KREUZBERG_CHUNKING_MAX_CHARS`. **Example Error**: ```text Chunking overlap (500) cannot be greater than or equal to max_chars (1000) ``` ### KREUZBERG_CACHE_ENABLED **Type**: `Boolean` (`true` or `false`, case-insensitive) **Default**: `true` Enable or disable extraction result caching. Cache stores results to avoid reprocessing identical documents. ```bash title="Cache Enable/Disable" # Enable cache (default, recommended for production) export KREUZBERG_CACHE_ENABLED=true # Disable cache (development, testing, or when cache is problematic) export KREUZBERG_CACHE_ENABLED=false # Case insensitive export KREUZBERG_CACHE_ENABLED=TRUE export KREUZBERG_CACHE_ENABLED=False ``` ### KREUZBERG_OUTPUT_FORMAT **Type**: `String` **Default**: `plain` **Valid Values**: `plain`, `markdown`, `djot`, `html` Controls the text content format of extraction results. Determines how extracted text is formatted in the result output. ```bash title="Output Format Options" # Plain text content only (default) export KREUZBERG_OUTPUT_FORMAT=plain # Markdown formatted output export KREUZBERG_OUTPUT_FORMAT=markdown # Djot markup format export KREUZBERG_OUTPUT_FORMAT=djot # HTML formatted output export KREUZBERG_OUTPUT_FORMAT=html ``` **Use Cases**: | Format | Use Case | | ---------- | --------------------------------------------------------------- | | `plain` | Raw extracted text without formatting | | `markdown` | Structured text with headings, lists, emphasis (RAG, LLM input) | | `djot` | Lightweight markup, alternative to Markdown | | `html` | Rich formatted output for web display | **Example:** ```bash title="Extract with markdown formatting" export KREUZBERG_OUTPUT_FORMAT=markdown kreuzberg ``` ### KREUZBERG_TOKEN_REDUCTION_MODE **Type**: `String` **Default**: `off` **Valid Values**: `off`, `light`, `moderate`, `aggressive`, `maximum` Token reduction aggressiveness for compressing extracted text while preserving meaning. Useful when working with token-limited LLMs. ```bash title="Token Reduction Mode Options" # No reduction (keep all text as-is) export KREUZBERG_TOKEN_REDUCTION_MODE=off # Light reduction: Remove common stopwords, minimal impact export KREUZBERG_TOKEN_REDUCTION_MODE=light # Moderate reduction: Balance between compression and meaning preservation export KREUZBERG_TOKEN_REDUCTION_MODE=moderate # Aggressive reduction: Significant compression, some detail loss export KREUZBERG_TOKEN_REDUCTION_MODE=aggressive # Maximum reduction: Extreme compression for token-constrained scenarios export KREUZBERG_TOKEN_REDUCTION_MODE=maximum ``` **Impact on Tokens**: | Mode | Typical Reduction | Use Case | | ------------ | ----------------- | ------------------------------------------- | | `off` | 0% | Full preservation, no compression | | `light` | 10-15% | Minimal impact, clean up obvious redundancy | | `moderate` | 25-35% | Balanced approach for most scenarios | | `aggressive` | 40-50% | Significant compression, still readable | | `maximum` | 50-70% | Extreme compression, lose some detail | ## Runtime Configuration Control cache location, debug output, and runtime behavior. ### KREUZBERG_CACHE_DIR **Type**: `String` (file system path) **Default**: Platform-specific global cache directory Override the default cache directory for storing extraction cache, models, and intermediate files. When unset, Kreuzberg uses a platform-appropriate global cache: - **Linux**: `~/.cache/kreuzberg/` (or `$XDG_CACHE_HOME/kreuzberg/`) - **macOS**: `~/Library/Caches/kreuzberg/` - **Windows**: `%LOCALAPPDATA%/kreuzberg/` If the platform cache directory cannot be determined, Kreuzberg falls back to `~/.cache/kreuzberg/`, then `.kreuzberg/` in the current working directory as a last resort. ```bash title="Cache Directory Configuration" # Default: uses platform-specific global cache (recommended) # unset KREUZBERG_CACHE_DIR # Store cache in specific location export KREUZBERG_CACHE_DIR=/var/cache/kreuzberg # Docker: Use volume mount export KREUZBERG_CACHE_DIR=/data/kreuzberg-cache # Development: Quick local cleanup export KREUZBERG_CACHE_DIR=/tmp/kreuzberg-cache ``` **Directory Structure**: Kreuzberg creates subdirectories for different cache types: ```text $KREUZBERG_CACHE_DIR/ ocr/ # OCR result cache embeddings/ # Chunk embedding cache extractions/ # Full extraction cache ``` ### KREUZBERG_CI_DEBUG **Type**: `Boolean` (presence check: set to any value to enable) **Default**: Disabled (unset) Enable detailed debug logging for CI environments. Outputs step-by-step timing and parameter information for OCR operations. ```bash title="Enable CI Debug Logging" # Enable CI debug output export KREUZBERG_CI_DEBUG=1 export KREUZBERG_CI_DEBUG=true export KREUZBERG_CI_DEBUG=yes # Output example: # [kreuzberg::ocr] perform_ocr:start bytes=1024000 language=eng output=text use_cache=true # [kreuzberg::ocr] perform_ocr:end duration_ms=2534 ``` **Use Cases**: - Debugging slow OCR operations - Tracing cache hits/misses - Performance profiling in CI pipelines - Understanding extraction pipeline behavior ### KREUZBERG_DEBUG_OCR **Type**: `Boolean` (presence check: set to any value to enable) **Default**: Disabled (unset) Enable OCR-specific debug output. Outputs diagnostic information about OCR decisions, fallbacks, and text coverage metrics. ```bash title="Enable OCR Debug Logging" # Enable OCR debug logging export KREUZBERG_DEBUG_OCR=1 # Output example: # [kreuzberg::pdf::ocr] fallback=true non_whitespace=8543 alnum=7234 meaningful_words=312 # [kreuzberg::pdf::ocr] avg_non_whitespace=45.2 avg_alnum=38.1 alnum_ratio=0.847 ``` **Diagnostic Information**: - Whether OCR fallback was triggered - Character counts (whitespace, alphanumeric) - Word counts and coverage ratios - Coverage thresholds and decisions ## Memory & Performance Configure caching for string encoding operations to optimize performance. ### KREUZBERG_ENCODING_CACHE_MAX_ENTRIES **Type**: `usize` (positive integer) **Default**: `10000` Maximum number of strings cached in the encoding cache. Each entry consumes memory proportional to string length. ```bash title="Encoding Cache Entry Limit" # Default: reasonable for most applications export KREUZBERG_ENCODING_CACHE_MAX_ENTRIES=10000 # Higher for very large batches export KREUZBERG_ENCODING_CACHE_MAX_ENTRIES=50000 # Lower to reduce memory usage export KREUZBERG_ENCODING_CACHE_MAX_ENTRIES=1000 ``` ### KREUZBERG_ENCODING_CACHE_MAX_BYTES **Type**: `usize` (bytes) **Default**: `104857600` (100 MB) Maximum total size of cached strings in bytes. Once exceeded, least-used entries are evicted. ```bash title="Encoding Cache Size Limit" # Default: 100 MB export KREUZBERG_ENCODING_CACHE_MAX_BYTES=104857600 # Larger cache for high-throughput scenarios export KREUZBERG_ENCODING_CACHE_MAX_BYTES=524288000 # 500 MB # Smaller cache for memory-constrained environments export KREUZBERG_ENCODING_CACHE_MAX_BYTES=10485760 # 10 MB ``` ## LLM Integration Configure LLM-powered features such as structured extraction, vision-based OCR, and provider-hosted embeddings. ### KREUZBERG_LLM_MODEL **Type**: `String` **Default**: None (must be set explicitly or via config) Default LLM model for structured extraction. Uses [liter-llm](https://github.com/kreuzberg-dev/liter-llm) model format (`provider/model-name`). ```bash title="LLM Model Configuration" # OpenAI export KREUZBERG_LLM_MODEL=openai/gpt-4o-mini # Anthropic export KREUZBERG_LLM_MODEL=anthropic/claude-sonnet-4-20250514 # Local provider export KREUZBERG_LLM_MODEL=ollama/llama3 ``` ### KREUZBERG_LLM_API_KEY **Type**: `String` **Default**: None API key for the structured extraction LLM provider. When not set, liter-llm falls back to provider-standard environment variables (for example, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`). ```bash title="LLM API Key Configuration" export KREUZBERG_LLM_API_KEY=sk-... ``` **Security Warning**: Prefer using provider-standard environment variables or a secrets manager over setting this directly. This variable is provided for cases where multiple providers are used and explicit key routing is needed. ### KREUZBERG_LLM_BASE_URL **Type**: `String` **Default**: None (uses provider default) Custom base URL for the structured extraction LLM provider. Useful for self-hosted models, proxies, or alternative API-compatible endpoints. ```bash title="LLM Base URL Configuration" # Custom OpenAI-compatible endpoint export KREUZBERG_LLM_BASE_URL=https://api.example.com # Local Ollama instance export KREUZBERG_LLM_BASE_URL=http://localhost:11434 ``` ### KREUZBERG_VLM_OCR_MODEL **Type**: `String` **Default**: None (must be set explicitly or via config) VLM (Vision Language Model) model for vision-based OCR. When configured, Kreuzberg can use a vision model as an OCR backend, sending document images directly to the VLM for text extraction. ```bash title="VLM OCR Model Configuration" # OpenAI GPT-4o for vision OCR export KREUZBERG_VLM_OCR_MODEL=openai/gpt-4o # Anthropic Claude for vision OCR export KREUZBERG_VLM_OCR_MODEL=anthropic/claude-sonnet-4-20250514 ``` ### KREUZBERG_VLM_EMBEDDING_MODEL **Type**: `String` **Default**: None (must be set explicitly or via config) LLM model for provider-hosted embeddings. Instead of running local ONNX embedding models, Kreuzberg can delegate embedding generation to a cloud provider's embedding API. ```bash title="VLM Embedding Model Configuration" # OpenAI embeddings export KREUZBERG_VLM_EMBEDDING_MODEL=openai/text-embedding-3-small # Cohere embeddings export KREUZBERG_VLM_EMBEDDING_MODEL=cohere/embed-english-v3.0 ``` **Note**: When `api_key` is not set in config, liter-llm falls back to provider-standard environment variables (for example, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`). | Variable | Description | Example | | ------------------------------- | -------------------------------------------------- | ------------------------------- | | `KREUZBERG_LLM_MODEL` | Default LLM model for structured extraction | `openai/gpt-4o-mini` | | `KREUZBERG_LLM_API_KEY` | API key for structured extraction LLM provider | `sk-...` | | `KREUZBERG_LLM_BASE_URL` | Custom base URL for structured extraction provider | `https://api.example.com` | | `KREUZBERG_VLM_OCR_MODEL` | VLM model for vision-based OCR | `openai/gpt-4o` | | `KREUZBERG_VLM_EMBEDDING_MODEL` | LLM model for provider-hosted embeddings | `openai/text-embedding-3-small` | --- ## Testing Variables Variables for development, testing, and quality assurance. ### KREUZBERG_RUN_FULL_OCR **Type**: `Boolean` (presence check: set to any value to enable) **Default**: Disabled (skips expensive tests) **Status**: Testing only Enable expensive OCR quality tests. These tests perform full OCR on large documents and are slow (can take minutes). ```bash title="Enable Full OCR Tests" # Skip expensive OCR tests (default, fast test runs) # unset KREUZBERG_RUN_FULL_OCR # Run full OCR quality tests export KREUZBERG_RUN_FULL_OCR=1 # In test output: # test test_ocr_quality_multi_page_consistency ... SKIPPED # Skipping test_ocr_quality_multi_page_consistency: set KREUZBERG_RUN_FULL_OCR=1 to enable ``` **Warning**: - These tests can take 10+ minutes - Require OCR backends to be installed and working - Produce large temporary files - Use only in CI/CD for comprehensive validation ## Docker Compose Examples ### Basic Configuration ```yaml title="Docker Compose - Basic Setup" version: "3.8" services: kreuzberg: image: kreuzberg:latest ports: - "3000:3000" environment: KREUZBERG_HOST: "0.0.0.0" KREUZBERG_PORT: "3000" KREUZBERG_OCR_LANGUAGE: "eng" KREUZBERG_CACHE_ENABLED: "true" ``` ### Production Configuration ```yaml title="Docker Compose - Production Setup" version: "3.8" services: kreuzberg: image: kreuzberg:latest ports: - "8000:8000" volumes: - kreuzberg_cache:/data/cache environment: KREUZBERG_HOST: "0.0.0.0" KREUZBERG_PORT: "8000" KREUZBERG_CORS_ORIGINS: "https://app.example.com, https://admin.example.com" KREUZBERG_MAX_REQUEST_BODY_BYTES: "209715200" # 200 MB KREUZBERG_MAX_MULTIPART_FIELD_BYTES: "209715200" KREUZBERG_CACHE_DIR: "/data/cache" KREUZBERG_OCR_LANGUAGE: "eng" KREUZBERG_OCR_BACKEND: "tesseract" KREUZBERG_CHUNKING_MAX_CHARS: "2000" KREUZBERG_CHUNKING_MAX_OVERLAP: "300" KREUZBERG_TOKEN_REDUCTION_MODE: "moderate" volumes: kreuzberg_cache: driver: local ``` ### Multilingual Configuration ```yaml title="Docker Compose - Multilingual Setup" version: "3.8" services: kreuzberg: image: kreuzberg:latest ports: - "8000:8000" environment: KREUZBERG_HOST: "0.0.0.0" KREUZBERG_PORT: "8000" KREUZBERG_OCR_BACKEND: "easyocr" # Better multilingual support KREUZBERG_OCR_LANGUAGE: "fra" # French KREUZBERG_CACHE_ENABLED: "true" ``` ### Development Configuration ```yaml title="Docker Compose - Development Setup" version: "3.8" services: kreuzberg: image: kreuzberg:latest ports: - "8000:8000" environment: KREUZBERG_HOST: "127.0.0.1" KREUZBERG_PORT: "8000" KREUZBERG_CACHE_ENABLED: "false" # Disable for fresh testing KREUZBERG_CI_DEBUG: "1" # Enable debug output KREUZBERG_DEBUG_OCR: "1" KREUZBERG_CACHE_DIR: "/tmp/kreuzberg" ``` ## Environment Variable Loading Order Kreuzberg applies environment variables in this order: 1. Load configuration file (TOML/YAML/JSON) if specified 2. Parse environment variables using `apply_env_overrides()` 3. Validate all settings This ensures environment variables always win over file configuration: ```rust title="Rust - Applying Environment Overrides" let mut config = ExtractionConfig::from_file("kreuzberg.toml")?; config.apply_env_overrides()?; // Overrides file values ``` ## Common Patterns ### Using with Config Files Combine files with environment overrides for flexibility: ```bash title="Combining Config Files with Env Overrides" # Load base config from file # Override specific values for this deployment export KREUZBERG_OCR_LANGUAGE=deu export KREUZBERG_CACHE_DIR=/mnt/cache kreuzberg --config kreuzberg.toml ``` ### Shell Script Initialization ```bash title="Environment-Based Shell Script" #!/bin/bash # Load deployment-specific settings if [ "$ENVIRONMENT" = "production" ]; then export KREUZBERG_HOST="0.0.0.0" export KREUZBERG_CORS_ORIGINS="https://app.example.com" export KREUZBERG_CACHE_ENABLED="true" export KREUZBERG_MAX_REQUEST_BODY_BYTES=$((200 * 1048576)) elif [ "$ENVIRONMENT" = "development" ]; then export KREUZBERG_HOST="127.0.0.1" export KREUZBERG_CACHE_ENABLED="false" export KREUZBERG_CI_DEBUG="1" fi kreuzberg ``` ### Kubernetes ConfigMap ```yaml title="Kubernetes ConfigMap and Pod Configuration" apiVersion: v1 kind: ConfigMap metadata: name: kreuzberg-config data: KREUZBERG_HOST: "0.0.0.0" KREUZBERG_PORT: "8000" KREUZBERG_CORS_ORIGINS: "https://api.example.com" KREUZBERG_CACHE_DIR: "/data/cache" KREUZBERG_OCR_BACKEND: "tesseract" KREUZBERG_TOKEN_REDUCTION_MODE: "moderate" --- apiVersion: v1 kind: Pod metadata: name: kreuzberg-server spec: containers: - name: kreuzberg image: kreuzberg:latest ports: - containerPort: 8000 envFrom: - configMapRef: name: kreuzberg-config volumeMounts: - name: cache mountPath: /data/cache volumes: - name: cache persistentVolumeClaim: claimName: kreuzberg-cache-pvc ``` ## ONNX Runtime Configuration ### ORT_DYLIB_PATH **Type**: `String` **Default**: Not set (bundled CPU ONNX Runtime is used) Path to a custom ONNX Runtime shared library. Set this to use a GPU-enabled ONNX Runtime instead of the bundled CPU-only version. Required for GPU acceleration (`cuda`, `tensorrt`) with PaddleOCR, layout detection, embeddings, and document orientation detection. ```bash title="GPU Acceleration Setup" # Linux — using ONNX Runtime GPU release export ORT_DYLIB_PATH=/usr/local/lib/libonnxruntime.so # Linux — using pip-installed onnxruntime-gpu export ORT_DYLIB_PATH=$(python -c "import onnxruntime; print(onnxruntime.__path__[0])")/capi/libonnxruntime.so # macOS — using Homebrew export ORT_DYLIB_PATH=/opt/homebrew/lib/libonnxruntime.dylib # Windows set ORT_DYLIB_PATH=C:\path\to\onnxruntime.dll ``` When not set, Kreuzberg auto-discovers system-installed ONNX Runtime on common paths. If no system library is found, the bundled CPU-only version is used. ## See Also - [Configuration Guide](./configuration.md) - Detailed configuration file format and options - [File Size Limits](./file-size-limits.md) - Upload and processing limits - [Types Reference](./types.md) - API type definitions and structures