Files
fil/docs/llms.txt

72 lines
5.0 KiB
Plaintext
Raw Permalink Normal View History

2026-06-01 23:40:55 +02:00
# Kreuzberg
> High-performance document intelligence library with a Rust core and bindings for 12 languages. Extract text, metadata, and structured data from 90+ file formats including PDF, DOCX, PPTX, XLSX, HTML, images (with OCR), email, archives, and more.
Kreuzberg provides async/sync APIs, text chunking, language detection, keyword extraction, and an extensible plugin system. It is 10-50x faster than pure-Python alternatives thanks to its Rust core.
## Getting Started
- [Installation](https://docs.kreuzberg.dev/getting-started/installation/): Install Kreuzberg for Python, TypeScript, Ruby, Rust, Go, Java, C#, PHP, Elixir, or WebAssembly
- [Quick Start](https://docs.kreuzberg.dev/getting-started/quickstart/): Get started with document extraction in minutes
## Concepts
- [Architecture](https://docs.kreuzberg.dev/concepts/architecture/): Rust core with polyglot bindings architecture
- [Extraction Pipeline](https://docs.kreuzberg.dev/concepts/extraction-pipeline/): How document extraction works
- [Plugin System](https://docs.kreuzberg.dev/concepts/plugin-system/): Extend Kreuzberg with custom extractors and backends
- [MIME Detection](https://docs.kreuzberg.dev/guides/extraction/#mime-type-detection): Automatic file format detection
- [Performance](https://docs.kreuzberg.dev/guides/development/#performance): Performance characteristics and optimization
## Guides
- [Extraction Basics](https://docs.kreuzberg.dev/guides/extraction/): Core extraction functionality
- [Configuration](https://docs.kreuzberg.dev/guides/configuration/): Configuration options
- [OCR](https://docs.kreuzberg.dev/guides/ocr/): Optical character recognition setup with Tesseract, EasyOCR, or PaddleOCR
- [Element-Based Output](https://docs.kreuzberg.dev/guides/output-formats/#element-based-output): Structured element extraction
- [Document Structure](https://docs.kreuzberg.dev/guides/output-formats/#document-structure): Hierarchical tree-based document output
- [PDF Hierarchy](https://docs.kreuzberg.dev/guides/output-formats/#pdf-hierarchy-detection): PDF hierarchy detection using K-means clustering
- [Keyword Extraction](https://docs.kreuzberg.dev/guides/keywords/): YAKE and RAKE keyword extraction
- [Advanced Features](https://docs.kreuzberg.dev/guides/advanced/): Advanced usage patterns
- [Docker Deployment](https://docs.kreuzberg.dev/guides/docker/): Docker container deployment
- [Kubernetes Deployment](https://docs.kreuzberg.dev/guides/kubernetes/): Kubernetes deployment guide
- [API Server](https://docs.kreuzberg.dev/guides/api-server/): REST API server setup
- [Creating Plugins](https://docs.kreuzberg.dev/guides/plugins/): Build custom extractor plugins
- [AI Coding Assistants](https://docs.kreuzberg.dev/guides/agent-skills/): Agent skills for AI coding assistants
- [C# Bindings](https://docs.kreuzberg.dev/guides/csharp/): C# binding usage guide
## CLI
- [CLI Usage](https://docs.kreuzberg.dev/cli/usage/): Command-line interface reference
## API Reference
- [Python API](https://docs.kreuzberg.dev/reference/api-python/): Python binding reference
- [TypeScript API](https://docs.kreuzberg.dev/reference/api-typescript/): TypeScript/Node.js binding reference
- [WebAssembly API](https://docs.kreuzberg.dev/reference/api-wasm/): WebAssembly binding reference
- [PHP API](https://docs.kreuzberg.dev/reference/api-php/): PHP binding reference
- [C# API](https://docs.kreuzberg.dev/reference/api-csharp/): C# binding reference
- [Go API](https://docs.kreuzberg.dev/reference/api-go/): Go binding reference
- [Java API](https://docs.kreuzberg.dev/reference/api-java/): Java binding reference
- [Rust API](https://docs.kreuzberg.dev/reference/api-rust/): Rust core library reference
- [Ruby API](https://docs.kreuzberg.dev/reference/api-ruby/): Ruby binding reference
- [R API](https://docs.kreuzberg.dev/reference/api-r/): R binding reference
- [Elixir API](https://docs.kreuzberg.dev/reference/api-elixir/): Elixir binding reference
- [C API](https://docs.kreuzberg.dev/reference/api-c/): C FFI binding reference
- [Configuration Reference](https://docs.kreuzberg.dev/reference/configuration/): Full configuration options
- [Environment Variables](https://docs.kreuzberg.dev/reference/environment-variables/): KREUZBERG_* environment variable reference
- [File Size Limits](https://docs.kreuzberg.dev/reference/file-size-limits/): Upload and processing size limits
- [Format Support](https://docs.kreuzberg.dev/reference/formats/): Supported file formats (90+)
- [Types](https://docs.kreuzberg.dev/reference/types/): Type definitions
- [Errors](https://docs.kreuzberg.dev/reference/errors/): Error types and handling
## Migration
- [From Unstructured](https://docs.kreuzberg.dev/migration/from-unstructured/): Migrate from the Unstructured library
## Optional
- [Features Overview](https://docs.kreuzberg.dev/features/): Complete feature list
- [Changelog](https://docs.kreuzberg.dev/CHANGELOG/): Release history
- [Contributing](https://docs.kreuzberg.dev/contributing/): Contribution guidelines
- [Comparisons](https://docs.kreuzberg.dev/comparisons/kreuzberg-vs-unstructured/): Kreuzberg vs alternatives