Files
fil/docs/getting-started/installation.md
Henrik Jess Nielsen b4c07d3693
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s
Nomad changes
2026-06-01 23:40:55 +02:00

16 KiB

description
description
Install Kreuzberg — pick Python, TypeScript, Rust, Go, CLI/Docker, or any of 12 supported languages.

Installation

Native bindings for 17 languages plus a standalone CLI. Every package ships prebuilt binaries for Linux (x86_64/aarch64), macOS (Apple Silicon), and Windows — no compile step needed.

:material-console: CLI / Docker

No SDK, no code — just your terminal.

=== "Install script"

```bash
curl -fsSL https://raw.githubusercontent.com/kreuzberg-dev/kreuzberg/main/scripts/install.sh | bash
```

=== "Homebrew"

```bash
brew install kreuzberg-dev/tap/kreuzberg
```

=== "Cargo"

```bash
cargo install kreuzberg-cli
```

=== "Docker (CLI image)"

```bash
docker pull ghcr.io/kreuzberg-dev/kreuzberg-cli:latest
docker run -v $(pwd):/data ghcr.io/kreuzberg-dev/kreuzberg-cli:latest extract /data/document.pdf
```

=== "Docker (full image)"

```bash
docker pull ghcr.io/kreuzberg-dev/kreuzberg:latest
```

CLI Usage{ .install-btn .install-btn--ghost } API Server Guide{ .install-btn .install-btn--solid }

!!! Warning "x86_64 CPU — AVX/AVX2 instruction set required"

The bundled ONNX Runtime binaries require **AVX/AVX2** CPU instructions. CPUs without AVX support (e.g. Intel Atom, Celeron N5105/Jasper Lake, older pre-2011 processors) will crash with an `invalid opcode` trap when using ONNX-dependent features. The affected features are **PaddleOCR**, **layout detection**, and **embeddings**. All other Kreuzberg functionality (text extraction, Tesseract OCR, chunking, metadata, etc.) works normally on any x86_64 CPU. ARM platforms (aarch64) are unaffected.

!!! Warning "Windows — ONNX Runtime required for Go, Elixir, and C/C++"

Go, Elixir, and C/C++ bindings on Windows link against ONNX Runtime dynamically. You must have `onnxruntime.dll` on your `PATH` at runtime. Download it from the [ONNX Runtime releases](https://github.com/microsoft/onnxruntime/releases) (for example `onnxruntime-win-x64-1.24.1.zip`). Python, TypeScript, Java, C#, Ruby, PHP, and Wasm are unaffected.

Choose your language


System requirements

Only relevant if building from source or enabling OCR:

Dependency When you need it
AVX/AVX2 CPU instructions Required for ONNX Runtime features (PaddleOCR, layout detection, embeddings) on x86_64
Rust toolchain (rustup) Building any native binding from source
C/C++ compiler Building native bindings (Xcode command-line tools / build-essential / MSVC)
Tesseract OCR Optional — brew install tesseract / apt install tesseract-ocr
PDFium Auto-fetched during builds

The Wasm package (@kreuzberg/wasm) has zero system dependencies.

GPU Acceleration

Kreuzberg bundles a CPU-only ONNX Runtime — ML features (PaddleOCR, layout detection, embeddings) work out of the box on CPU.

For GPU acceleration, install a GPU-enabled ONNX Runtime and set ORT_DYLIB_PATH:

Platform Install Set ORT_DYLIB_PATH
Linux (CUDA) Download from ONNX Runtime releases export ORT_DYLIB_PATH=/path/to/libonnxruntime.so
Python (any OS) pip install onnxruntime-gpu Point at the pip package's capi/ directory
macOS (CoreML) Works with bundled ORT — no extra setup needed

See AccelerationConfig and ORT_DYLIB_PATH for details.


Language-specific notes

Edge cases and alternative install methods where they come up.

TypeScript

Two npm packages target different runtimes:

Package Best for Performance
@kreuzberg/node Node.js, Bun — server-side apps Native (100%)
@kreuzberg/wasm Browsers, Deno, Cloudflare Workers Wasm (~60-80%)

Both work with pnpm (pnpm add) and Yarn (yarn add) as well.

!!! Note "pnpm workspaces"

In monorepos, add this to your root `.npmrc` so platform-specific optional deps resolve correctly:

```ini
auto-install-peers=true
```

??? Note "Wasm — Browser usage"

```html
<script type="module">
  import { initWasm, extractFromFile } from "@kreuzberg/wasm";

  await initWasm();

  const input = document.getElementById("file");
  input.addEventListener("change", async (e) => {
    const result = await extractFromFile(e.target.files[0]);
    console.log(result.content);
  });
</script>

<input type="file" id="file" />
```

??? Note "Wasm — Deno"

```typescript
import { initWasm, extractFile } from "npm:@kreuzberg/wasm";

await initWasm();
const result = await extractFile("./document.pdf");
console.log(result.content);
```

??? Note "Wasm — Cloudflare Workers"

```typescript
import { initWasm, extractBytes } from "@kreuzberg/wasm";

export default {
  async fetch(request: Request): Promise<Response> {
    await initWasm();
    const bytes = new Uint8Array(await request.arrayBuffer());
    const result = await extractBytes(bytes, "application/pdf");
    return Response.json({ content: result.content });
  },
};
```

Supported runtimes: Chrome 74+, Firefox 79+, Safari 14+, Edge 79+, Node.js 22+, Deno 1.35+, Cloudflare Workers.

!!! Warning "Wasm Platform Limitations"

The Wasm binding does not support:

- **Layout detection** (RT-DETR model inference requires ONNX Runtime unavailable in WebAssembly)
- **Hardware acceleration config** (single-threaded WASM, no GPU access)
- **Concurrency config** (single-threaded environment, `maxThreads` is ignored)
- **Email codepage config** (EmailConfig not available)

All other features (text extraction, OCR via Tesseract WASM, chunking, embeddings, metadata, tables, language detection, image extraction) work fully in WASM. See the [WASM API Reference](../reference/api-wasm.md) for details.

Java

=== "Maven"

```xml
<dependency>
    <groupId>dev.kreuzberg</groupId>
    <artifactId>kreuzberg</artifactId>
    <version>5.0.0-rc.1</version>
</dependency>
```

=== "Gradle"

```gradle
implementation 'dev.kreuzberg:kreuzberg:5.0.0-rc.1'
```

Requires Java 25+ (FFM/Panama API). Native libraries are bundled in the JAR.

Elixir

Add to mix.exs:

def deps do
  [
    {:kreuzberg, "~> 5.0.0-rc.1"}
  ]
end
mix deps.get

Ships prebuilt NIF binaries via RustlerPrecompiled. Falls back to compiling from source if no prebuilt matches your platform (requires Rust).

!!! Warning "Windows"

The Windows NIF links against ONNX Runtime dynamically. `onnxruntime.dll` must be on your `PATH` at runtime — see the note at the top of this page.

Go

go get github.com/kreuzberg-dev/kreuzberg/v5@latest

!!! Warning "Windows"

The Go binding links against ONNX Runtime dynamically on Windows. `onnxruntime.dll` must be on your `PATH` at runtime — see the note at the top of this page.

!!! Note "Windows feature limitations"

The Go and C/C++ bindings on Windows (MinGW/GNU target) do not include **PaddleOCR**, **layout detection**, or **auto-rotate**. Tesseract OCR and all other features work normally. These limitations apply only to Windows; Linux and macOS builds include the full feature set.

Rust

Enable features selectively in Cargo.toml:

[dependencies]
kreuzberg = { version = "4", features = ["tokio-runtime"] }
# Optional features: pdf, ocr, chunking

C / C++

Build the FFI library from source:

cargo build --release -p kreuzberg-ffi

This produces libkreuzberg_ffi.a and a header at crates/kreuzberg-ffi/kreuzberg.h. Link into your project:

HEADER_DIR = path/to/crates/kreuzberg-ffi
LIBDIR     = path/to/target/release

CFLAGS  = -Wall -Wextra -I$(HEADER_DIR)
LDFLAGS = -L$(LIBDIR) -lkreuzberg_ffi -lpthread -ldl -lm

my_app: my_app.c
	$(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)

!!! Tip "Platform-specific linker flags"

**macOS:** add `-framework CoreFoundation -framework Security`

**Windows:** add `-lws2_32 -luserenv -lbcrypt`

!!! Warning "Windows"

The Windows FFI library links against ONNX Runtime dynamically. `onnxruntime.dll` must be on your `PATH` at runtime — see the note at the top of this page.

API Reference →

Dart / Flutter

Pure-Dart and Flutter consumers share the same package. Dart SDK 3.0 or higher is required. Flutter is supported on macOS, iOS, Android, Linux, and Windows; Flutter Web is not supported because the runtime is a native dynamic library delivered via flutter_rust_bridge. For Flutter projects use flutter pub add kreuzberg instead of dart pub add kreuzberg.

Kotlin

The Kotlin module sits on top of the Java facade and reuses its Foreign Function & Memory native loader, so the same bundled binaries serve both bindings. Requires JDK 25 or higher. Use the Kotlin DSL block above for build.gradle.kts consumers; Maven and Groovy DSL are also supported — see the README at packages/kotlin/ for both.

Swift

Swift Package Manager from swift-tools-version: 6.0 upward. Targets macOS 13+ and iOS 16+; Linux is not currently declared in Package.swift. Once the package ships its binaryTarget, no manual cargo build is needed; in the interim, building the library locally requires cargo build -p kreuzberg-swift against the workspace.

Zig

Requires Zig 0.16.0 or higher (declared via minimum_zig_version in build.zig.zon). The Zig binding consumes the C FFI surface from kreuzberg-ffi via linkSystemLibrary; the build expects the consumer to provide a search path to the prebuilt libkreuzberg_ffi and the C header kreuzberg.h. The zig fetch command above pins the source archive in build.zig.zon; wire it into build.zig via b.dependency("kreuzberg", ...).


Development setup

For working on the Kreuzberg repository itself:

task setup      # installs all language toolchains
task lint        # linters across all languages
task dev:test    # full test suite

See Contributing for conventions and expectations.