hjess/fil

Fork 0

Files

Henrik Jess Nielsen b4c07d3693

Deploy fil (kreuzberg) / deploy (push) Successful in 49s

Details

Nomad changes

2026-06-01 23:40:55 +02:00

5.7 KiB

Raw Blame History

{% include 'partials/badges.html.jinja' %}

What This Package Provides

Document intelligence core — extract text, tables, images, metadata, entities, keywords, and code intelligence from one API.
Format coverage — PDF, Office, images, HTML/XML, email, archives, notebooks, citations, scientific formats, and plain text.
OCR choices — Tesseract, PaddleOCR, EasyOCR where supported, VLM OCR through liter-llm, and plugin hooks for custom backends.
Same engine as every binding — Rust, Python, Node.js, Go, Java, PHP, Ruby, .NET, Elixir, R, WASM, Kotlin Android, Swift, Dart, Zig, and C FFI share the same Rust implementation. {% if language == "typescript" %}
Node-first TypeScript API — NAPI-RS package with typed options/results and async extraction. {% elif language == "python" %}
Python package — sync and async APIs with typed results for ingestion, RAG, and data workflows. {% elif language == "go" %}
Go module — context-aware API over the shared native library. {% elif language == "java" %}
Java package — FFM binding for direct native document extraction. {% elif language == "php" %}
PHP package — PHP 8.2+ API with generated types. {% elif language == "ruby" %}
Ruby package — native extension with idiomatic Ruby objects. {% elif language == "csharp" %}
.NET package — async/await API with nullable-aware result types. {% elif language == "elixir" %}
BEAM package — Rustler NIF binding for OTP pipelines. {% elif language == "wasm" %}
WASM package — browser and edge-compatible extraction where native libraries are unavailable. {% elif language == "r" %}
R package — data workflow binding with data-frame-friendly extracted structures. {% elif language == "ffi" %}
C ABI — stable shared library surface for custom hosts and secondary bindings. {% elif language == "kotlin_android" %}
Android AAR — JNI-backed package for mobile extraction workloads. {% elif language == "swift" %}
SwiftPM package — Swift Concurrency API for Apple targets. {% elif language == "dart" %}
Dart package — Future/Stream API through flutter_rust_bridge. {% elif language == "zig" %}
Zig package — wrapper over the C FFI with explicit memory ownership. {% endif %}

Installation

{% include 'partials/installation.md.jinja' %}

Quick Start

{% include 'partials/quick_start.md.jinja' %}

{% if language == "typescript" %} {% include 'partials/napi_implementation.md.jinja' %}

{% endif %}

Features

{% include 'partials/features.md.jinja' %}

{% if features.ocr %}

OCR Support

Kreuzberg supports multiple OCR backends for extracting text from scanned documents and images:

{% for backend in ocr_backends %}

{{ backend | title }} {% endfor %}

OCR Configuration Example

{% endif %} {% if features.async %}

Async Support

This binding provides full async/await support for non-blocking document processing:

{% endif %} {% if features.plugin_system %}

Plugin System

Kreuzberg supports extensible post-processing plugins for custom text transformation and filtering.

For detailed plugin documentation, visit Plugin System Guide.

{% if snippets.plugin_system %}

Plugin Example

{% endif %} {% endif %} {% if features.embeddings %}

Embeddings Support

Generate vector embeddings for extracted text using the built-in ONNX Runtime support. Requires ONNX Runtime installation.

Embeddings Guide {% endif %}

{% if snippets.batch_processing %}

Batch Processing

Process multiple documents efficiently:

{% endif %}

Configuration

For advanced configuration options including language detection, table extraction, OCR settings, and more:

Configuration Guide

Documentation

Contributing

Contributions are welcome! See Contributing Guide.

Part of Kreuzberg.dev

Kreuzberg Cloud — managed extraction API with SDKs, dashboards, and observability.
kreuzcrawl — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
html-to-markdown — fast, lossless HTML→Markdown engine.
liter-llm — universal LLM API client with native bindings for 14 languages and 143 providers.
tree-sitter-language-pack — tree-sitter grammars and code-intelligence primitives.
alef — the polyglot binding generator that produces this README and all per-language bindings.
Discord — community, roadmap, announcements.

License

{{ license }} License — see LICENSE for details.

Support

Discord Community: Join our Discord
GitHub Issues: Report bugs
Discussions: Ask questions

5.7 KiB Raw Blame History

{{ name }}