6.4 KiB
description
| description |
|---|
| Kreuzberg – Extract text, tables, and metadata from 90+ file formats with a Rust core and native bindings for 17 languages. No GPU required. |
Kreuzberg
Document intelligence with a Rust core and native bindings for 17 languages. Extract text, tables, and metadata from 90+ formats with optional OCR — usable as an SDK, CLI, REST API, MCP server, or Docker image.
:material-play-circle: Live Demo{ .md-button .md-button--primary } :material-lightning-bolt: Quick Start{ .md-button } :material-package-variant: Installation{ .md-button } :fontawesome-brands-discord: Join our Community{ .md-button }
Why Kreuzberg
-
:material-flash:{ .lg .middle } High Performance
Rust core with native PDFium, SIMD optimizations, and full parallelism. Process thousands of documents per minute without a GPU.
-
:material-file-document-multiple:{ .lg .middle } 90+ File Formats
PDF, DOCX, XLSX, PPTX, images, HTML, XML, emails, archives, academic formats — one API handles them all.
-
:material-eye:{ .lg .middle } Multi-Engine OCR
Tesseract and PaddleOCR work across all language bindings. EasyOCR is available for Python only.
-
:material-translate:{ .lg .middle } 16 Language Bindings
Native bindings for Python, TypeScript, Rust, Go, Java, Kotlin, C#, Ruby, PHP, Elixir, R, Dart, Swift, Zig, C, and WebAssembly.
-
:material-code-tags:{ .lg .middle } Code Intelligence
Extract functions, classes, imports, symbols, and docstrings from 300+ programming languages. Results in the code_intelligence field with semantic chunking.
-
:material-puzzle:{ .lg .middle } Plugin System
Register custom extractors, OCR backends, post-processors, and validators. Plugin authoring is primarily supported in Python; all bindings can consume registered plugins.
-
:material-server:{ .lg .middle } Flexible Deployment
Use as a library, CLI tool, REST API server, MCP server, or Docker container. Pick what fits your stack.
Language Support
| Language | Package | Docs |
|---|---|---|
| Python | pip install kreuzberg |
API Reference |
| TypeScript (Native) | npm install @kreuzberg/node |
API Reference |
| TypeScript (WASM) | npm install @kreuzberg/wasm |
API Reference |
| Rust | cargo add kreuzberg |
API Reference |
| Go | go get github.com/kreuzberg-dev/kreuzberg/v5 |
API Reference |
| Java | Maven Central dev.kreuzberg:kreuzberg |
API Reference |
| Kotlin | Maven Central dev.kreuzberg:kreuzberg-kotlin |
API Reference |
| C# | dotnet add package Kreuzberg |
API Reference |
| Ruby | gem install kreuzberg |
API Reference |
| PHP | composer require kreuzberg/kreuzberg |
API Reference |
| Elixir | {:kreuzberg, "~> 5.0.0-rc.1"} |
API Reference |
| R | r-universe kreuzberg |
API Reference |
| Dart / Flutter | dart pub add kreuzberg |
API Reference |
| Swift | Swift Package Manager | API Reference |
| Zig | zig fetch --save from GitHub |
API Reference |
| C (FFI) | Shared library + header | API Reference |
| CLI | brew install kreuzberg-dev/tap/kreuzberg |
CLI Guide |
| Docker | ghcr.io/kreuzberg-dev/kreuzberg |
Docker Guide |
!!! Tip "Choosing Between TypeScript Packages"
**`@kreuzberg/node`** — Use for Node.js servers and CLI tools. Native performance (100% speed).
**`@kreuzberg/wasm`** — Use for browsers, Cloudflare Workers, Deno, Bun, and serverless environments (60-80% speed, cross-platform).
Explore the Docs
-
:material-rocket-launch:{ .lg .middle } Getting Started
Install Kreuzberg and extract your first document in minutes.
-
:material-book-open-variant:{ .lg .middle } Guides
Configuration, OCR setup, Docker deployment, plugins, and more.
-
:material-puzzle-outline:{ .lg .middle } Concepts
Architecture, extraction pipeline, MIME detection, and performance.
-
:material-api:{ .lg .middle } API Reference
Complete API docs for every language binding, types, and errors.
-
:material-console:{ .lg .middle } CLI & Servers
Command-line tool, REST API server, and MCP server for AI agents.
-
:material-swap-horizontal:{ .lg .middle } Migration
Migrate from Unstructured or other document extraction libraries.
Getting Help
- Bugs & feature requests — Open an issue on GitHub
- Community chat — Join the Discord
- Contributing — Read the contributor guide