Files
fil/docs/index.md

138 lines
6.4 KiB
Markdown
Raw Permalink Normal View History

2026-06-01 23:40:55 +02:00
---
description: "Kreuzberg Extract text, tables, and metadata from 90+ file formats with a Rust core and native bindings for 17 languages. No GPU required."
---
# Kreuzberg
Document intelligence with a Rust core and native bindings for 17 languages. Extract text, tables, and metadata from 90+ formats with optional OCR — usable as an SDK, CLI, REST API, MCP server, or Docker image.
<div class="hero-badges" markdown>
[:material-play-circle: Live Demo](demo.html){ .md-button .md-button--primary }
[:material-lightning-bolt: Quick Start](getting-started/quickstart.md){ .md-button }
[:material-package-variant: Installation](getting-started/installation.md){ .md-button }
[:fontawesome-brands-discord: Join our Community](https://discord.gg/xt9WY3GnKR){ .md-button }
</div>
---
## Why Kreuzberg
<div class="grid cards" markdown>
- :material-flash:{ .lg .middle } **High Performance**
Rust core with native PDFium, SIMD optimizations, and full parallelism. Process thousands of documents per minute without a GPU.
- :material-file-document-multiple:{ .lg .middle } **90+ File Formats**
PDF, DOCX, XLSX, PPTX, images, HTML, XML, emails, archives, academic formats — one API handles them all.
- :material-eye:{ .lg .middle } **Multi-Engine OCR**
Tesseract and PaddleOCR work across all language bindings. EasyOCR is available for Python only.
- :material-translate:{ .lg .middle } **16 Language Bindings**
Native bindings for Python, TypeScript, Rust, Go, Java, Kotlin, C#, Ruby, PHP, Elixir, R, Dart, Swift, Zig, C, and WebAssembly.
- :material-code-tags:{ .lg .middle } **Code Intelligence**
Extract functions, classes, imports, symbols, and docstrings from 300+ programming languages. Results in the **code_intelligence** field with semantic chunking.
- :material-puzzle:{ .lg .middle } **Plugin System**
Register custom extractors, OCR backends, post-processors, and validators. Plugin authoring is primarily supported in Python; all bindings can consume registered plugins.
- :material-server:{ .lg .middle } **Flexible Deployment**
Use as a library, CLI tool, REST API server, MCP server, or Docker container. Pick what fits your stack.
</div>
**[See all features](features.md)**
---
## Language Support
| Language | Package | Docs |
| :---------------------- | :--------------------------------------------- | :------------------------------------------- |
| **Python** | `pip install kreuzberg` | [API Reference](reference/api-python.md) |
| **TypeScript (Native)** | `npm install @kreuzberg/node` | [API Reference](reference/api-typescript.md) |
| **TypeScript (WASM)** | `npm install @kreuzberg/wasm` | [API Reference](reference/api-wasm.md) |
| **Rust** | `cargo add kreuzberg` | [API Reference](reference/api-rust.md) |
| **Go** | `go get github.com/kreuzberg-dev/kreuzberg/v5` | [API Reference](reference/api-go.md) |
| **Java** | Maven Central `dev.kreuzberg:kreuzberg` | [API Reference](reference/api-java.md) |
| **Kotlin** | Maven Central `dev.kreuzberg:kreuzberg-kotlin` | [API Reference](reference/api-kotlin.md) |
| **C#** | `dotnet add package Kreuzberg` | [API Reference](reference/api-csharp.md) |
| **Ruby** | `gem install kreuzberg` | [API Reference](reference/api-ruby.md) |
| **PHP** | `composer require kreuzberg/kreuzberg` | [API Reference](reference/api-php.md) |
| **Elixir** | `{:kreuzberg, "~> 5.0.0-rc.1"}` | [API Reference](reference/api-elixir.md) |
| **R** | r-universe `kreuzberg` | [API Reference](reference/api-r.md) |
| **Dart / Flutter** | `dart pub add kreuzberg` | [API Reference](reference/api-dart.md) |
| **Swift** | Swift Package Manager | [API Reference](reference/api-swift.md) |
| **Zig** | `zig fetch --save` from GitHub | [API Reference](reference/api-zig.md) |
| **C (FFI)** | Shared library + header | [API Reference](reference/api-c.md) |
| **CLI** | `brew install kreuzberg-dev/tap/kreuzberg` | [CLI Guide](cli/usage.md) |
| **Docker** | `ghcr.io/kreuzberg-dev/kreuzberg` | [Docker Guide](guides/docker.md) |
!!! Tip "Choosing Between TypeScript Packages"
**`@kreuzberg/node`** — Use for Node.js servers and CLI tools. Native performance (100% speed).
**`@kreuzberg/wasm`** — Use for browsers, Cloudflare Workers, Deno, Bun, and serverless environments (60-80% speed, cross-platform).
---
## Explore the Docs
<div class="grid cards" markdown>
- :material-rocket-launch:{ .lg .middle } **Getting Started**
Install Kreuzberg and extract your first document in minutes.
[:octicons-arrow-right-24: Quick Start](getting-started/quickstart.md)
- :material-book-open-variant:{ .lg .middle } **Guides**
Configuration, OCR setup, Docker deployment, plugins, and more.
[:octicons-arrow-right-24: All Guides](guides/extraction.md)
- :material-puzzle-outline:{ .lg .middle } **Concepts**
Architecture, extraction pipeline, MIME detection, and performance.
[:octicons-arrow-right-24: Architecture](concepts/architecture.md)
- :material-api:{ .lg .middle } **API Reference**
Complete API docs for every language binding, types, and errors.
[:octicons-arrow-right-24: References](reference/api-python.md)
- :material-console:{ .lg .middle } **CLI & Servers**
Command-line tool, REST API server, and MCP server for AI agents.
[:octicons-arrow-right-24: CLI Usage](cli/usage.md)
- :material-swap-horizontal:{ .lg .middle } **Migration**
Migrate from Unstructured or other document extraction libraries.
[:octicons-arrow-right-24: Migration Guides](migration/from-unstructured.md)
</div>
---
## Getting Help
- **Bugs & feature requests** — [Open an issue on GitHub](https://github.com/kreuzberg-dev/kreuzberg/issues)
- **Community chat** — [Join the Discord](https://discord.gg/xt9WY3GnKR)
- **Contributing** — [Read the contributor guide](contributing.md)