templates/readme/language_package.md

# {{ name }}

{% include 'partials/badges.html.jinja' %}

{{ description }}

## What This Package Provides

- **Document intelligence core** — extract text, tables, images, metadata, entities, keywords, and code intelligence from one API.
- **Format coverage** — PDF, Office, images, HTML/XML, email, archives, notebooks, citations, scientific formats, and plain text.
- **OCR choices** — Tesseract, PaddleOCR, EasyOCR where supported, VLM OCR through liter-llm, and plugin hooks for custom backends.
- **Same engine as every binding** — Rust, Python, Node.js, Go, Java, PHP, Ruby, .NET, Elixir, R, WASM, Kotlin Android, Swift, Dart, Zig, and C FFI share the same Rust implementation.
{% if language == "typescript" %}
- **Node-first TypeScript API** — NAPI-RS package with typed options/results and async extraction.
{% elif language == "python" %}
- **Python package** — sync and async APIs with typed results for ingestion, RAG, and data workflows.
{% elif language == "go" %}
- **Go module** — context-aware API over the shared native library.
{% elif language == "java" %}
- **Java package** — FFM binding for direct native document extraction.
{% elif language == "php" %}
- **PHP package** — PHP 8.2+ API with generated types.
{% elif language == "ruby" %}
- **Ruby package** — native extension with idiomatic Ruby objects.
{% elif language == "csharp" %}
- **.NET package** — async/await API with nullable-aware result types.
{% elif language == "elixir" %}
- **BEAM package** — Rustler NIF binding for OTP pipelines.
{% elif language == "wasm" %}
- **WASM package** — browser and edge-compatible extraction where native libraries are unavailable.
{% elif language == "r" %}
- **R package** — data workflow binding with data-frame-friendly extracted structures.
{% elif language == "ffi" %}
- **C ABI** — stable shared library surface for custom hosts and secondary bindings.
{% elif language == "kotlin_android" %}
- **Android AAR** — JNI-backed package for mobile extraction workloads.
{% elif language == "swift" %}
- **SwiftPM package** — Swift Concurrency API for Apple targets.
{% elif language == "dart" %}
- **Dart package** — Future/Stream API through flutter_rust_bridge.
{% elif language == "zig" %}
- **Zig package** — wrapper over the C FFI with explicit memory ownership.
{% endif %}

## Installation

{% include 'partials/installation.md.jinja' %}

## Quick Start

{% include 'partials/quick_start.md.jinja' %}

{% if language == "typescript" %}
{% include 'partials/napi_implementation.md.jinja' %}

{% endif %}

## Features

{% include 'partials/features.md.jinja' %}

{% if features.ocr %}

## OCR Support

Kreuzberg supports multiple OCR backends for extracting text from scanned documents and images:

{% for backend in ocr_backends %}

- **{{ backend | title }}**
  {% endfor %}

### OCR Configuration Example

{{ snippets.ocr_configuration | include_snippet(language) }}

{% endif %}
{% if features.async %}

## Async Support

This binding provides full async/await support for non-blocking document processing:

{{ snippets.async_extraction | include_snippet(language) }}

{% endif %}
{% if features.plugin_system %}

## Plugin System

Kreuzberg supports extensible post-processing plugins for custom text transformation and filtering.

For detailed plugin documentation, visit [Plugin System Guide](https://docs.kreuzberg.dev/guides/plugins/).

{% if snippets.plugin_system %}

### Plugin Example

{{ snippets.plugin_system | include_snippet(language) }}

{% endif %}
{% endif %}
{% if features.embeddings %}

## Embeddings Support

Generate vector embeddings for extracted text using the built-in ONNX Runtime support. Requires ONNX Runtime installation.

**[Embeddings Guide](https://docs.kreuzberg.dev/features/#embeddings)**
{% endif %}

{% if snippets.batch_processing %}

## Batch Processing

Process multiple documents efficiently:

{{ snippets.batch_processing | include_snippet(language) }}

{% endif %}

## Configuration

For advanced configuration options including language detection, table extraction, OCR settings, and more:

**[Configuration Guide](https://docs.kreuzberg.dev/guides/configuration/)**

## Documentation

- **[Official Documentation](https://docs.kreuzberg.dev/)**
- **[API Reference](https://docs.kreuzberg.dev/reference/api-python/)**
- **[Examples & Guides](https://docs.kreuzberg.dev/)**

## Contributing

Contributions are welcome! See [Contributing Guide](https://github.com/kreuzberg-dev/kreuzberg/blob/main/CONTRIBUTING.md).

## Part of Kreuzberg.dev

- [Kreuzberg Cloud](https://github.com/kreuzberg-dev/kreuzberg-cloud) — managed extraction API with SDKs, dashboards, and observability.
- [kreuzcrawl](https://github.com/kreuzberg-dev/kreuzcrawl) — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
- [html-to-markdown](https://github.com/kreuzberg-dev/html-to-markdown) — fast, lossless HTML→Markdown engine.
- [liter-llm](https://github.com/kreuzberg-dev/liter-llm) — universal LLM API client with native bindings for 14 languages and 143 providers.
- [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — tree-sitter grammars and code-intelligence primitives.
- [alef](https://github.com/kreuzberg-dev/alef) — the polyglot binding generator that produces this README and all per-language bindings.
- [Discord](https://discord.gg/xt9WY3GnKR) — community, roadmap, announcements.

## License

{{ license }} License — see [LICENSE](../../LICENSE) for details.

## Support

- **Discord Community**: [Join our Discord](https://discord.gg/xt9WY3GnKR)
- **GitHub Issues**: [Report bugs](https://github.com/kreuzberg-dev/kreuzberg/issues)
- **Discussions**: [Ask questions](https://github.com/kreuzberg-dev/kreuzberg/discussions)
Nomad changes 2026-06-01 23:40:55 +02:00			`# {{ name }}`

			`{% include 'partials/badges.html.jinja' %}`

			`{{ description }}`

			`## What This Package Provides`

			`- Document intelligence core — extract text, tables, images, metadata, entities, keywords, and code intelligence from one API.`
			`- Format coverage — PDF, Office, images, HTML/XML, email, archives, notebooks, citations, scientific formats, and plain text.`
			`- OCR choices — Tesseract, PaddleOCR, EasyOCR where supported, VLM OCR through liter-llm, and plugin hooks for custom backends.`
			`- Same engine as every binding — Rust, Python, Node.js, Go, Java, PHP, Ruby, .NET, Elixir, R, WASM, Kotlin Android, Swift, Dart, Zig, and C FFI share the same Rust implementation.`
			`{% if language == "typescript" %}`
			`- Node-first TypeScript API — NAPI-RS package with typed options/results and async extraction.`
			`{% elif language == "python" %}`
			`- Python package — sync and async APIs with typed results for ingestion, RAG, and data workflows.`
			`{% elif language == "go" %}`
			`- Go module — context-aware API over the shared native library.`
			`{% elif language == "java" %}`
			`- Java package — FFM binding for direct native document extraction.`
			`{% elif language == "php" %}`
			`- PHP package — PHP 8.2+ API with generated types.`
			`{% elif language == "ruby" %}`
			`- Ruby package — native extension with idiomatic Ruby objects.`
			`{% elif language == "csharp" %}`
			`- .NET package — async/await API with nullable-aware result types.`
			`{% elif language == "elixir" %}`
			`- BEAM package — Rustler NIF binding for OTP pipelines.`
			`{% elif language == "wasm" %}`
			`- WASM package — browser and edge-compatible extraction where native libraries are unavailable.`
			`{% elif language == "r" %}`
			`- R package — data workflow binding with data-frame-friendly extracted structures.`
			`{% elif language == "ffi" %}`
			`- C ABI — stable shared library surface for custom hosts and secondary bindings.`
			`{% elif language == "kotlin_android" %}`
			`- Android AAR — JNI-backed package for mobile extraction workloads.`
			`{% elif language == "swift" %}`
			`- SwiftPM package — Swift Concurrency API for Apple targets.`
			`{% elif language == "dart" %}`
			`- Dart package — Future/Stream API through flutter_rust_bridge.`
			`{% elif language == "zig" %}`
			`- Zig package — wrapper over the C FFI with explicit memory ownership.`
			`{% endif %}`

			`## Installation`

			`{% include 'partials/installation.md.jinja' %}`

			`## Quick Start`

			`{% include 'partials/quick_start.md.jinja' %}`

			`{% if language == "typescript" %}`
			`{% include 'partials/napi_implementation.md.jinja' %}`

			`{% endif %}`

			`## Features`

			`{% include 'partials/features.md.jinja' %}`

			`{% if features.ocr %}`

			`## OCR Support`

			`Kreuzberg supports multiple OCR backends for extracting text from scanned documents and images:`

			`{% for backend in ocr_backends %}`

			`- {{ backend \| title }}`
			`{% endfor %}`

			`### OCR Configuration Example`

			`{{ snippets.ocr_configuration \| include_snippet(language) }}`

			`{% endif %}`
			`{% if features.async %}`

			`## Async Support`

			`This binding provides full async/await support for non-blocking document processing:`

			`{{ snippets.async_extraction \| include_snippet(language) }}`

			`{% endif %}`
			`{% if features.plugin_system %}`

			`## Plugin System`

			`Kreuzberg supports extensible post-processing plugins for custom text transformation and filtering.`

			`For detailed plugin documentation, visit [Plugin System Guide](https://docs.kreuzberg.dev/guides/plugins/).`

			`{% if snippets.plugin_system %}`

			`### Plugin Example`

			`{{ snippets.plugin_system \| include_snippet(language) }}`

			`{% endif %}`
			`{% endif %}`
			`{% if features.embeddings %}`

			`## Embeddings Support`

			`Generate vector embeddings for extracted text using the built-in ONNX Runtime support. Requires ONNX Runtime installation.`

			`[Embeddings Guide](https://docs.kreuzberg.dev/features/#embeddings)`
			`{% endif %}`

			`{% if snippets.batch_processing %}`

			`## Batch Processing`

			`Process multiple documents efficiently:`

			`{{ snippets.batch_processing \| include_snippet(language) }}`

			`{% endif %}`

			`## Configuration`

			`For advanced configuration options including language detection, table extraction, OCR settings, and more:`

			`[Configuration Guide](https://docs.kreuzberg.dev/guides/configuration/)`

			`## Documentation`

			`- [Official Documentation](https://docs.kreuzberg.dev/)`
			`- [API Reference](https://docs.kreuzberg.dev/reference/api-python/)`
			`- [Examples & Guides](https://docs.kreuzberg.dev/)`

			`## Contributing`

			`Contributions are welcome! See [Contributing Guide](https://github.com/kreuzberg-dev/kreuzberg/blob/main/CONTRIBUTING.md).`

			`## Part of Kreuzberg.dev`

			`- [Kreuzberg Cloud](https://github.com/kreuzberg-dev/kreuzberg-cloud) — managed extraction API with SDKs, dashboards, and observability.`
			`- [kreuzcrawl](https://github.com/kreuzberg-dev/kreuzcrawl) — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.`
			`- [html-to-markdown](https://github.com/kreuzberg-dev/html-to-markdown) — fast, lossless HTML→Markdown engine.`
			`- [liter-llm](https://github.com/kreuzberg-dev/liter-llm) — universal LLM API client with native bindings for 14 languages and 143 providers.`
			`- [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — tree-sitter grammars and code-intelligence primitives.`
			`- [alef](https://github.com/kreuzberg-dev/alef) — the polyglot binding generator that produces this README and all per-language bindings.`
			`- [Discord](https://discord.gg/xt9WY3GnKR) — community, roadmap, announcements.`

			`## License`

			`{{ license }} License — see [LICENSE](../../LICENSE) for details.`

			`## Support`

			`- Discord Community: [Join our Discord](https://discord.gg/xt9WY3GnKR)`
			`- GitHub Issues: [Report bugs](https://github.com/kreuzberg-dev/kreuzberg/issues)`
			`- Discussions: [Ask questions](https://github.com/kreuzberg-dev/kreuzberg/discussions)`