Files
fil/docs/guides/configuration.md
Henrik Jess Nielsen b4c07d3693
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s
Nomad changes
2026-06-01 23:40:55 +02:00

4.8 KiB

Configuration Guide v4.0.0

All extraction behavior is controlled through ExtractionConfig. Pass it directly in code or load it from a TOML/YAML/JSON file. Every field is optional. For per-field documentation, see the Configuration Reference.

Quick Start

=== "Python"

--8<-- "snippets/python/config/config_basic.md"

=== "TypeScript"

--8<-- "snippets/typescript/config/config_basic.md"

=== "Rust"

--8<-- "snippets/rust/config/config_basic.md"

=== "Go"

--8<-- "snippets/go/config/config_basic.md"

=== "Java"

--8<-- "snippets/java/config/config_basic.md"

=== "C#"

--8<-- "snippets/csharp/config_basic.md"

=== "Ruby"

--8<-- "snippets/ruby/config/config_basic.md"

=== "R"

--8<-- "snippets/r/config/config_basic.md"

Configuration Files

Three formats are supported. TOML is recommended.

=== "TOML (Recommended)"

```toml title="kreuzberg.toml"
use_cache = true
enable_quality_processing = true

[ocr]
backend = "tesseract"
language = "eng"

[ocr.tesseract_config]
psm = 3
```

=== "YAML"

```yaml title="kreuzberg.yaml"
use_cache: true
enable_quality_processing: true

ocr:
  backend: tesseract
  language: eng
  tesseract_config:
    psm: 3
```

=== "JSON"

```json title="kreuzberg.json"
{
  "use_cache": true,
  "enable_quality_processing": true,
  "ocr": {
    "backend": "tesseract",
    "language": "eng",
    "tesseract_config": {
      "psm": 3
    }
  }
}
```

Automatic Discovery

When no --config path is supplied, Kreuzberg walks up from the current working directory looking for kreuzberg.toml and uses the first match. YAML and JSON files are supported only when passed explicitly via --config. If nothing is found, defaults are used.

=== "Python"

--8<-- "snippets/python/config/config_discover.md"

=== "TypeScript"

--8<-- "snippets/typescript/config/config_discover.md"

=== "Rust"

--8<-- "snippets/rust/config/config_discover.md"

=== "Go"

--8<-- "snippets/go/config/config_discover.md"

=== "Java"

--8<-- "snippets/java/config/config_discover.md"

=== "C#"

--8<-- "snippets/csharp/config_discover.md"

=== "Ruby"

--8<-- "snippets/ruby/config/config_discover.md"

=== "R"

--8<-- "snippets/r/config/config_discover.md"

=== "Wasm"

--8<-- "snippets/wasm/config/config_discover.md"

Common Use Cases

Setting Up OCR

=== "Python"

--8<-- "snippets/python/config/config_ocr.md"

=== "TypeScript"

--8<-- "snippets/typescript/config/config_ocr.md"

=== "Rust"

--8<-- "snippets/rust/ocr/config_ocr.md"

=== "Go"

--8<-- "snippets/go/config/config_ocr.md"

=== "Java"

--8<-- "snippets/java/config/config_ocr.md"

=== "C#"

--8<-- "snippets/csharp/config_ocr.md"

=== "Ruby"

--8<-- "snippets/ruby/config/config_ocr.md"

=== "R"

--8<-- "snippets/r/config/config_ocr.md"

For backend selection and language packs, see OCR Guide. For fine-grained Tesseract tuning, see TesseractConfig Reference.

Chunking for RAG

=== "Python"

--8<-- "snippets/python/utils/chunking.md"

=== "TypeScript"

--8<-- "snippets/typescript/utils/chunking.md"

=== "Rust"

--8<-- "snippets/rust/advanced/chunking.md"

=== "Go"

--8<-- "snippets/go/utils/chunking.md"

=== "Java"

--8<-- "snippets/java/utils/chunking.md"

=== "C#"

--8<-- "snippets/csharp/advanced/embedding_with_chunking.md"

=== "Ruby"

--8<-- "snippets/ruby/utils/chunking.md"

=== "R"

--8<-- "snippets/r/utils/chunking.md"

All Configuration Categories

Next Steps