481 lines
16 KiB
Markdown
481 lines
16 KiB
Markdown
---
|
|
description: "Install Kreuzberg — pick Python, TypeScript, Rust, Go, CLI/Docker, or any of 12 supported languages."
|
|
---
|
|
|
|
# Installation
|
|
|
|
Native bindings for 17 languages plus a standalone CLI. Every package ships **prebuilt binaries** for Linux (x86_64/aarch64), macOS (Apple Silicon), and Windows — no compile step needed.
|
|
|
|
<div class="cli-hero" markdown>
|
|
|
|
## :material-console: CLI / Docker { #cli--docker }
|
|
|
|
No SDK, no code — just your terminal.
|
|
|
|
=== "Install script"
|
|
|
|
```bash
|
|
curl -fsSL https://raw.githubusercontent.com/kreuzberg-dev/kreuzberg/main/scripts/install.sh | bash
|
|
```
|
|
|
|
=== "Homebrew"
|
|
|
|
```bash
|
|
brew install kreuzberg-dev/tap/kreuzberg
|
|
```
|
|
|
|
=== "Cargo"
|
|
|
|
```bash
|
|
cargo install kreuzberg-cli
|
|
```
|
|
|
|
=== "Docker (CLI image)"
|
|
|
|
```bash
|
|
docker pull ghcr.io/kreuzberg-dev/kreuzberg-cli:latest
|
|
docker run -v $(pwd):/data ghcr.io/kreuzberg-dev/kreuzberg-cli:latest extract /data/document.pdf
|
|
```
|
|
|
|
=== "Docker (full image)"
|
|
|
|
```bash
|
|
docker pull ghcr.io/kreuzberg-dev/kreuzberg:latest
|
|
```
|
|
|
|
[CLI Usage](../cli/usage.md){ .install-btn .install-btn--ghost }
|
|
[API Server Guide](../guides/api-server.md){ .install-btn .install-btn--solid }
|
|
|
|
</div>
|
|
|
|
!!! Warning "x86_64 CPU — AVX/AVX2 instruction set required"
|
|
|
|
The bundled ONNX Runtime binaries require **AVX/AVX2** CPU instructions. CPUs without AVX support (e.g. Intel Atom, Celeron N5105/Jasper Lake, older pre-2011 processors) will crash with an `invalid opcode` trap when using ONNX-dependent features. The affected features are **PaddleOCR**, **layout detection**, and **embeddings**. All other Kreuzberg functionality (text extraction, Tesseract OCR, chunking, metadata, etc.) works normally on any x86_64 CPU. ARM platforms (aarch64) are unaffected.
|
|
|
|
!!! Warning "Windows — ONNX Runtime required for Go, Elixir, and C/C++"
|
|
|
|
Go, Elixir, and C/C++ bindings on Windows link against ONNX Runtime dynamically. You must have `onnxruntime.dll` on your `PATH` at runtime. Download it from the [ONNX Runtime releases](https://github.com/microsoft/onnxruntime/releases) (for example `onnxruntime-win-x64-1.24.1.zip`). Python, TypeScript, Java, C#, Ruby, PHP, and Wasm are unaffected.
|
|
|
|
## Choose your language
|
|
|
|
<div class="grid cards install-cards" markdown>
|
|
|
|
- :fontawesome-brands-python:{ .lg .middle } **Python**
|
|
|
|
***
|
|
|
|
```bash
|
|
pip install kreuzberg
|
|
```
|
|
|
|
[API Reference](../reference/api-python.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](quickstart.md){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :fontawesome-brands-node-js:{ .lg .middle } **TypeScript (Node.js / Bun)**
|
|
|
|
***
|
|
|
|
```bash
|
|
npm install @kreuzberg/node
|
|
```
|
|
|
|
[API Reference](../reference/api-typescript.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](#typescript){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :fontawesome-brands-js:{ .lg .middle } **TypeScript (Browser / Edge)**
|
|
|
|
***
|
|
|
|
```bash
|
|
npm install @kreuzberg/wasm
|
|
```
|
|
|
|
[API Reference](../reference/api-wasm.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](#typescript){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :fontawesome-brands-rust:{ .lg .middle } **Rust**
|
|
|
|
***
|
|
|
|
```bash
|
|
cargo add kreuzberg
|
|
```
|
|
|
|
[API Reference](../reference/api-rust.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](quickstart.md){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :fontawesome-brands-golang:{ .lg .middle } **Go**
|
|
|
|
***
|
|
|
|
```bash
|
|
go get github.com/kreuzberg-dev/kreuzberg/v5@latest
|
|
```
|
|
|
|
[API Reference](../reference/api-go.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](quickstart.md){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :fontawesome-brands-java:{ .lg .middle } **Java**
|
|
|
|
***
|
|
|
|
```gradle
|
|
implementation 'dev.kreuzberg:kreuzberg:5.0.0-rc.1'
|
|
```
|
|
|
|
[API Reference](../reference/api-java.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](#java){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :fontawesome-brands-kotlin:{ .lg .middle } **Kotlin**
|
|
|
|
***
|
|
|
|
```kotlin
|
|
implementation("dev.kreuzberg:kreuzberg-kotlin:5.0.0-rc.1")
|
|
```
|
|
|
|
[API Reference](../reference/api-kotlin.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](#kotlin){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :material-language-ruby:{ .lg .middle } **Ruby**
|
|
|
|
***
|
|
|
|
```bash
|
|
gem install kreuzberg
|
|
```
|
|
|
|
[API Reference](../reference/api-ruby.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](quickstart.md){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :fontawesome-brands-swift:{ .lg .middle } **Swift**
|
|
|
|
***
|
|
|
|
```swift
|
|
.package(url: "https://github.com/kreuzberg-dev/kreuzberg.git", from: "5.0.0-rc.1")
|
|
```
|
|
|
|
[API Reference](../reference/api-swift.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](#swift){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :material-language-csharp:{ .lg .middle } **C# / .NET**
|
|
|
|
***
|
|
|
|
```bash
|
|
dotnet add package Kreuzberg
|
|
```
|
|
|
|
[API Reference](../reference/api-csharp.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](../reference/api-csharp.md){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :fontawesome-brands-php:{ .lg .middle } **PHP**
|
|
|
|
***
|
|
|
|
```bash
|
|
composer require kreuzberg/kreuzberg
|
|
```
|
|
|
|
[API Reference](../reference/api-php.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](quickstart.md){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :simple-elixir:{ .lg .middle } **Elixir**
|
|
|
|
***
|
|
|
|
```elixir
|
|
{:kreuzberg, "~> 5.0.0-rc.1"}
|
|
```
|
|
|
|
[API Reference](../reference/api-elixir.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](#elixir){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :simple-r:{ .lg .middle } **R**
|
|
|
|
***
|
|
|
|
```r
|
|
install.packages("kreuzberg",
|
|
repos = "https://kreuzberg-dev.r-universe.dev")
|
|
```
|
|
|
|
[API Reference](../reference/api-r.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](quickstart.md){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :simple-cplusplus:{ .lg .middle } **C / C++**
|
|
|
|
***
|
|
|
|
```bash
|
|
cargo build -p kreuzberg-ffi
|
|
```
|
|
|
|
[API Reference](../reference/api-c.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](#c-c){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :material-language-dart:{ .lg .middle } **Dart / Flutter**
|
|
|
|
***
|
|
|
|
```bash
|
|
dart pub add kreuzberg
|
|
```
|
|
|
|
[API Reference](../reference/api-dart.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](#dart){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
- :material-language-zig:{ .lg .middle } **Zig**
|
|
|
|
***
|
|
|
|
```bash
|
|
zig fetch --save https://github.com/kreuzberg-dev/kreuzberg/archive/refs/tags/v5.0.0-rc.1.tar.gz
|
|
```
|
|
|
|
[API Reference](../reference/api-zig.md){ .install-api-link }
|
|
[:material-lightning-bolt: Quick Start](#zig){ .install-btn .install-btn--solid .install-btn--sm }
|
|
|
|
</div>
|
|
|
|
---
|
|
|
|
## System requirements
|
|
|
|
Only relevant if building from source or enabling OCR:
|
|
|
|
| Dependency | When you need it |
|
|
| ------------------------- | -------------------------------------------------------------------------------------- |
|
|
| AVX/AVX2 CPU instructions | Required for ONNX Runtime features (PaddleOCR, layout detection, embeddings) on x86_64 |
|
|
| Rust toolchain (`rustup`) | Building any native binding from source |
|
|
| C/C++ compiler | Building native bindings (Xcode command-line tools / `build-essential` / MSVC) |
|
|
| Tesseract OCR | Optional — `brew install tesseract` / `apt install tesseract-ocr` |
|
|
| PDFium | Auto-fetched during builds |
|
|
|
|
The Wasm package (`@kreuzberg/wasm`) has **zero** system dependencies.
|
|
|
|
### GPU Acceleration
|
|
|
|
Kreuzberg bundles a CPU-only ONNX Runtime — ML features (PaddleOCR, layout detection, embeddings) work out of the box on CPU.
|
|
|
|
For GPU acceleration, install a GPU-enabled ONNX Runtime and set `ORT_DYLIB_PATH`:
|
|
|
|
| Platform | Install | Set ORT_DYLIB_PATH |
|
|
| --------------- | ---------------------------------------------------------------------------------------- | -------------------------------------------------- |
|
|
| Linux (CUDA) | Download from [ONNX Runtime releases](https://github.com/microsoft/onnxruntime/releases) | `export ORT_DYLIB_PATH=/path/to/libonnxruntime.so` |
|
|
| Python (any OS) | `pip install onnxruntime-gpu` | Point at the pip package's `capi/` directory |
|
|
| macOS (CoreML) | Works with bundled ORT — no extra setup needed | — |
|
|
|
|
See [AccelerationConfig](../reference/configuration.md#accelerationconfig) and [ORT_DYLIB_PATH](../reference/environment-variables.md#ort_dylib_path) for details.
|
|
|
|
---
|
|
|
|
## Language-specific notes
|
|
|
|
Edge cases and alternative install methods where they come up.
|
|
|
|
### TypeScript
|
|
|
|
Two npm packages target different runtimes:
|
|
|
|
| Package | Best for | Performance |
|
|
| ----------------- | ---------------------------------- | -------------- |
|
|
| `@kreuzberg/node` | Node.js, Bun — server-side apps | Native (100%) |
|
|
| `@kreuzberg/wasm` | Browsers, Deno, Cloudflare Workers | Wasm (~60-80%) |
|
|
|
|
Both work with **pnpm** (`pnpm add`) and **Yarn** (`yarn add`) as well.
|
|
|
|
!!! Note "pnpm workspaces"
|
|
|
|
In monorepos, add this to your root `.npmrc` so platform-specific optional deps resolve correctly:
|
|
|
|
```ini
|
|
auto-install-peers=true
|
|
```
|
|
|
|
??? Note "Wasm — Browser usage"
|
|
|
|
```html
|
|
<script type="module">
|
|
import { initWasm, extractFromFile } from "@kreuzberg/wasm";
|
|
|
|
await initWasm();
|
|
|
|
const input = document.getElementById("file");
|
|
input.addEventListener("change", async (e) => {
|
|
const result = await extractFromFile(e.target.files[0]);
|
|
console.log(result.content);
|
|
});
|
|
</script>
|
|
|
|
<input type="file" id="file" />
|
|
```
|
|
|
|
??? Note "Wasm — Deno"
|
|
|
|
```typescript
|
|
import { initWasm, extractFile } from "npm:@kreuzberg/wasm";
|
|
|
|
await initWasm();
|
|
const result = await extractFile("./document.pdf");
|
|
console.log(result.content);
|
|
```
|
|
|
|
??? Note "Wasm — Cloudflare Workers"
|
|
|
|
```typescript
|
|
import { initWasm, extractBytes } from "@kreuzberg/wasm";
|
|
|
|
export default {
|
|
async fetch(request: Request): Promise<Response> {
|
|
await initWasm();
|
|
const bytes = new Uint8Array(await request.arrayBuffer());
|
|
const result = await extractBytes(bytes, "application/pdf");
|
|
return Response.json({ content: result.content });
|
|
},
|
|
};
|
|
```
|
|
|
|
**Supported runtimes:** Chrome 74+, Firefox 79+, Safari 14+, Edge 79+, Node.js 22+, Deno 1.35+, Cloudflare Workers.
|
|
|
|
!!! Warning "Wasm Platform Limitations"
|
|
|
|
The Wasm binding does not support:
|
|
|
|
- **Layout detection** (RT-DETR model inference requires ONNX Runtime unavailable in WebAssembly)
|
|
- **Hardware acceleration config** (single-threaded WASM, no GPU access)
|
|
- **Concurrency config** (single-threaded environment, `maxThreads` is ignored)
|
|
- **Email codepage config** (EmailConfig not available)
|
|
|
|
All other features (text extraction, OCR via Tesseract WASM, chunking, embeddings, metadata, tables, language detection, image extraction) work fully in WASM. See the [WASM API Reference](../reference/api-wasm.md) for details.
|
|
|
|
### Java
|
|
|
|
=== "Maven"
|
|
|
|
```xml
|
|
<dependency>
|
|
<groupId>dev.kreuzberg</groupId>
|
|
<artifactId>kreuzberg</artifactId>
|
|
<version>5.0.0-rc.1</version>
|
|
</dependency>
|
|
```
|
|
|
|
=== "Gradle"
|
|
|
|
```gradle
|
|
implementation 'dev.kreuzberg:kreuzberg:5.0.0-rc.1'
|
|
```
|
|
|
|
Requires Java 25+ (FFM/Panama API). Native libraries are bundled in the JAR.
|
|
|
|
### Elixir
|
|
|
|
Add to `mix.exs`:
|
|
|
|
```elixir
|
|
def deps do
|
|
[
|
|
{:kreuzberg, "~> 5.0.0-rc.1"}
|
|
]
|
|
end
|
|
```
|
|
|
|
```bash
|
|
mix deps.get
|
|
```
|
|
|
|
Ships prebuilt NIF binaries via RustlerPrecompiled. Falls back to compiling from source if no prebuilt matches your platform (requires Rust).
|
|
|
|
!!! Warning "Windows"
|
|
|
|
The Windows NIF links against ONNX Runtime dynamically. `onnxruntime.dll` must be on your `PATH` at runtime — see the note at the top of this page.
|
|
|
|
### Go
|
|
|
|
```bash
|
|
go get github.com/kreuzberg-dev/kreuzberg/v5@latest
|
|
```
|
|
|
|
!!! Warning "Windows"
|
|
|
|
The Go binding links against ONNX Runtime dynamically on Windows. `onnxruntime.dll` must be on your `PATH` at runtime — see the note at the top of this page.
|
|
|
|
!!! Note "Windows feature limitations"
|
|
|
|
The Go and C/C++ bindings on Windows (MinGW/GNU target) do not include **PaddleOCR**, **layout detection**, or **auto-rotate**. Tesseract OCR and all other features work normally. These limitations apply only to Windows; Linux and macOS builds include the full feature set.
|
|
|
|
### Rust
|
|
|
|
Enable features selectively in `Cargo.toml`:
|
|
|
|
```toml title="Cargo.toml"
|
|
[dependencies]
|
|
kreuzberg = { version = "4", features = ["tokio-runtime"] }
|
|
# Optional features: pdf, ocr, chunking
|
|
```
|
|
|
|
### C / C++
|
|
|
|
Build the FFI library from source:
|
|
|
|
```bash
|
|
cargo build --release -p kreuzberg-ffi
|
|
```
|
|
|
|
This produces `libkreuzberg_ffi.a` and a header at `crates/kreuzberg-ffi/kreuzberg.h`. Link into your project:
|
|
|
|
```makefile
|
|
HEADER_DIR = path/to/crates/kreuzberg-ffi
|
|
LIBDIR = path/to/target/release
|
|
|
|
CFLAGS = -Wall -Wextra -I$(HEADER_DIR)
|
|
LDFLAGS = -L$(LIBDIR) -lkreuzberg_ffi -lpthread -ldl -lm
|
|
|
|
my_app: my_app.c
|
|
$(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
|
|
```
|
|
|
|
!!! Tip "Platform-specific linker flags"
|
|
|
|
**macOS:** add `-framework CoreFoundation -framework Security`
|
|
|
|
**Windows:** add `-lws2_32 -luserenv -lbcrypt`
|
|
|
|
!!! Warning "Windows"
|
|
|
|
The Windows FFI library links against ONNX Runtime dynamically. `onnxruntime.dll` must be on your `PATH` at runtime — see the note at the top of this page.
|
|
|
|
[API Reference →](../reference/api-c.md)
|
|
|
|
### Dart / Flutter { #dart }
|
|
|
|
Pure-Dart and Flutter consumers share the same package. Dart SDK 3.0 or higher is required. Flutter is supported on macOS, iOS, Android, Linux, and Windows; Flutter Web is not supported because the runtime is a native dynamic library delivered via flutter_rust_bridge. For Flutter projects use `flutter pub add kreuzberg` instead of `dart pub add kreuzberg`.
|
|
|
|
### Kotlin { #kotlin }
|
|
|
|
The Kotlin module sits on top of the Java facade and reuses its Foreign Function & Memory native loader, so the same bundled binaries serve both bindings. Requires JDK 25 or higher. Use the Kotlin DSL block above for `build.gradle.kts` consumers; Maven and Groovy DSL are also supported — see the README at packages/kotlin/ for both.
|
|
|
|
### Swift { #swift }
|
|
|
|
Swift Package Manager from `swift-tools-version: 6.0` upward. Targets macOS 13+ and iOS 16+; Linux is not currently declared in `Package.swift`. Once the package ships its `binaryTarget`, no manual cargo build is needed; in the interim, building the library locally requires `cargo build -p kreuzberg-swift` against the workspace.
|
|
|
|
### Zig { #zig }
|
|
|
|
Requires Zig 0.16.0 or higher (declared via `minimum_zig_version` in `build.zig.zon`). The Zig binding consumes the C FFI surface from `kreuzberg-ffi` via `linkSystemLibrary`; the build expects the consumer to provide a search path to the prebuilt `libkreuzberg_ffi` and the C header `kreuzberg.h`. The `zig fetch` command above pins the source archive in `build.zig.zon`; wire it into `build.zig` via `b.dependency("kreuzberg", ...)`.
|
|
|
|
---
|
|
|
|
## Development setup
|
|
|
|
For working on the Kreuzberg repository itself:
|
|
|
|
```bash
|
|
task setup # installs all language toolchains
|
|
task lint # linters across all languages
|
|
task dev:test # full test suite
|
|
```
|
|
|
|
See [Contributing](../contributing.md) for conventions and expectations.
|