# {{ name }}

{% include 'partials/badges.html.jinja' %}

{{ description }}

## What This Package Provides

- **Document intelligence core** — extract text, tables, images, metadata, entities, keywords, and code intelligence from one API.
- **Format coverage** — PDF, Office, images, HTML/XML, email, archives, notebooks, citations, scientific formats, and plain text.
- **OCR choices** — Tesseract, PaddleOCR, EasyOCR where supported, VLM OCR through liter-llm, and plugin hooks for custom backends.
- **Same engine as every binding** — Rust, Python, Node.js, Go, Java, PHP, Ruby, .NET, Elixir, R, WASM, Kotlin Android, Swift, Dart, Zig, and C FFI share the same Rust implementation.
{% if language == "typescript" %}
- **Node-first TypeScript API** — NAPI-RS package with typed options/results and async extraction.
{% elif language == "python" %}
- **Python package** — sync and async APIs with typed results for ingestion, RAG, and data workflows.
{% elif language == "go" %}
- **Go module** — context-aware API over the shared native library.
{% elif language == "java" %}
- **Java package** — FFM binding for direct native document extraction.
{% elif language == "php" %}
- **PHP package** — PHP 8.2+ API with generated types.
{% elif language == "ruby" %}
- **Ruby package** — native extension with idiomatic Ruby objects.
{% elif language == "csharp" %}
- **.NET package** — async/await API with nullable-aware result types.
{% elif language == "elixir" %}
- **BEAM package** — Rustler NIF binding for OTP pipelines.
{% elif language == "wasm" %}
- **WASM package** — browser and edge-compatible extraction where native libraries are unavailable.
{% elif language == "r" %}
- **R package** — data workflow binding with data-frame-friendly extracted structures.
{% elif language == "ffi" %}
- **C ABI** — stable shared library surface for custom hosts and secondary bindings.
{% elif language == "kotlin_android" %}
- **Android AAR** — JNI-backed package for mobile extraction workloads.
{% elif language == "swift" %}
- **SwiftPM package** — Swift Concurrency API for Apple targets.
{% elif language == "dart" %}
- **Dart package** — Future/Stream API through flutter_rust_bridge.
{% elif language == "zig" %}
- **Zig package** — wrapper over the C FFI with explicit memory ownership.
{% endif %}

## Installation

{% include 'partials/installation.md.jinja' %}

## Quick Start

{% include 'partials/quick_start.md.jinja' %}

{% if language == "typescript" %}
{% include 'partials/napi_implementation.md.jinja' %}

{% endif %}

## Features

{% include 'partials/features.md.jinja' %}

{% if features.ocr %}

## OCR Support

Kreuzberg supports multiple OCR backends for extracting text from scanned documents and images:

{% for backend in ocr_backends %}

- **{{ backend | title }}**
  {% endfor %}

### OCR Configuration Example

{{ snippets.ocr_configuration | include_snippet(language) }}

{% endif %}
{% if features.async %}

## Async Support

This binding provides full async/await support for non-blocking document processing:

{{ snippets.async_extraction | include_snippet(language) }}

{% endif %}
{% if features.plugin_system %}

## Plugin System

Kreuzberg supports extensible post-processing plugins for custom text transformation and filtering.

For detailed plugin documentation, visit [Plugin System Guide](https://docs.kreuzberg.dev/guides/plugins/).

{% if snippets.plugin_system %}

### Plugin Example

{{ snippets.plugin_system | include_snippet(language) }}

{% endif %}
{% endif %}
{% if features.embeddings %}

## Embeddings Support

Generate vector embeddings for extracted text using the built-in ONNX Runtime support. Requires ONNX Runtime installation.

**[Embeddings Guide](https://docs.kreuzberg.dev/features/#embeddings)**
{% endif %}

{% if snippets.batch_processing %}

## Batch Processing

Process multiple documents efficiently:

{{ snippets.batch_processing | include_snippet(language) }}

{% endif %}

## Configuration

For advanced configuration options including language detection, table extraction, OCR settings, and more:

**[Configuration Guide](https://docs.kreuzberg.dev/guides/configuration/)**

## Documentation

- **[Official Documentation](https://docs.kreuzberg.dev/)**
- **[API Reference](https://docs.kreuzberg.dev/reference/api-python/)**
- **[Examples & Guides](https://docs.kreuzberg.dev/)**

## Contributing

Contributions are welcome! See [Contributing Guide](https://github.com/kreuzberg-dev/kreuzberg/blob/main/CONTRIBUTING.md).

## Part of Kreuzberg.dev

- [Kreuzberg Cloud](https://github.com/kreuzberg-dev/kreuzberg-cloud) — managed extraction API with SDKs, dashboards, and observability.
- [kreuzcrawl](https://github.com/kreuzberg-dev/kreuzcrawl) — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
- [html-to-markdown](https://github.com/kreuzberg-dev/html-to-markdown) — fast, lossless HTML→Markdown engine.
- [liter-llm](https://github.com/kreuzberg-dev/liter-llm) — universal LLM API client with native bindings for 14 languages and 143 providers.
- [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — tree-sitter grammars and code-intelligence primitives.
- [alef](https://github.com/kreuzberg-dev/alef) — the polyglot binding generator that produces this README and all per-language bindings.
- [Discord](https://discord.gg/xt9WY3GnKR) — community, roadmap, announcements.

## License

{{ license }} License — see [LICENSE](../../LICENSE) for details.

## Support

- **Discord Community**: [Join our Discord](https://discord.gg/xt9WY3GnKR)
- **GitHub Issues**: [Report bugs](https://github.com/kreuzberg-dev/kreuzberg/issues)
- **Discussions**: [Ask questions](https://github.com/kreuzberg-dev/kreuzberg/discussions)