14 KiB
Generated
Kreuzberg
High-performance document intelligence for Go backed by the Rust core that powers every Kreuzberg binding.
Version 5.0.0-rc.3 Report issues at github.com/kreuzberg-dev/kreuzberg.
What This Package Provides
- Go module over the Rust core — context-aware extraction with Go structs and errors.
- Structured results — text, tables, images, metadata, language detection, chunks, and warnings.
- Static-link workflow — build against
kreuzberg-ffiand ship a self-contained Go binary. - Cross-binding parity — output matches the Python, Node.js, Ruby, Java, .NET, PHP, Elixir, R, Dart, Swift, Zig, WASM, and C FFI packages.
Install
Kreuzberg Go binaries are statically linked — once built, they are self-contained and require no runtime library dependencies. Only the static library is needed at build time.
Quick Start (Monorepo Development)
For development in the Kreuzberg monorepo:
# Build the static FFI library
cargo build -p kreuzberg-ffi --release
# Go build will automatically link against the static library
# (from target/release/libkreuzberg_ffi.a)
cd packages/go/v5
go build -v
# Run your binary (no library path needed - it's statically linked)
./v4
That's it! The resulting binary is self-contained and has no runtime dependencies on Kreuzberg libraries.
Using Go Modules
To use this package via go get:
# Get the latest release
go get github.com/kreuzberg-dev/kreuzberg/v5@latest
# Or a specific version
go get github.com/kreuzberg-dev/kreuzberg/v5@v5.0.0-rc.3
You'll need to provide the static library at build time. See Building with Static Libraries below.
Building with Static Libraries
When building outside the Kreuzberg monorepo, you need to provide the static library (.a file on Unix, .lib on Windows).
Option 1: Download Pre-built Static Library
Download the static library for your platform from GitHub Releases:
# Example: Linux x86_64
curl -LO https://github.com/kreuzberg-dev/kreuzberg/releases/download/v5.0.0-rc.3/go-ffi-linux-x86_64.tar.gz
tar -xzf go-ffi-linux-x86_64.tar.gz
# Copy to a permanent location
mkdir -p ~/kreuzberg/lib
cp kreuzberg-ffi/lib/libkreuzberg_ffi.a ~/kreuzberg/lib/
Then build with CGO_LDFLAGS:
# Linux/macOS
CGO_LDFLAGS="-L$HOME/kreuzberg/lib -lkreuzberg_ffi" go build
# Windows (MSVC)
set CGO_LDFLAGS=-L%USERPROFILE%\kreuzberg\lib -lkreuzberg_ffi
go build
Option 2: Build Static Library Yourself
If pre-built libraries aren't available for your platform:
# Clone the repository
git clone https://github.com/kreuzberg-dev/kreuzberg.git
cd kreuzberg
# Build the static library
cargo build -p kreuzberg-ffi --release
# The static library is now at: target/release/libkreuzberg_ffi.a
# Copy it to a permanent location
mkdir -p ~/kreuzberg/lib
cp target/release/libkreuzberg_ffi.a ~/kreuzberg/lib/
# Now you can build Go projects
cd ~/my-go-project
CGO_LDFLAGS="-L$HOME/kreuzberg/lib -lkreuzberg_ffi" go build
System Requirements
ONNX Runtime (for embeddings)
If using embeddings functionality, ONNX Runtime must be installed at build time:
# macOS
brew install onnxruntime
# Ubuntu/Debian
sudo apt install libonnxruntime libonnxruntime-dev
# Windows (MSVC)
scoop install onnxruntime
# OR download from https://github.com/microsoft/onnxruntime/releases
The resulting binary will have ONNX Runtime statically linked or dynamically linked depending on how the FFI library was built. Check the build configuration.
Note: Windows MinGW builds do not support embeddings (ONNX Runtime requires MSVC). Use Windows MSVC for embeddings support.
Quickstart
package main
import (
"fmt"
"log"
"github.com/kreuzberg-dev/kreuzberg/v5"
)
func main() {
result, err := v4.ExtractFileSync("document.pdf", nil)
if err != nil {
log.Fatalf("extract failed: %v", err)
}
fmt.Println("MIME:", result.MimeType)
fmt.Println("First 200 chars:")
fmt.Println(result.Content[:200])
}
Build and run:
# Build (make sure you have the static library available - see Install)
CGO_LDFLAGS="-L$HOME/kreuzberg/lib -lkreuzberg_ffi" go build
# Run - no library paths needed!
./myapp
The binary is self-contained and can be distributed without any Kreuzberg library dependencies.
Examples
Extract bytes
data, err := os.ReadFile("slides.pptx")
if err != nil {
log.Fatal(err)
}
result, err := v4.ExtractBytesSync(data, "application/vnd.openxmlformats-officedocument.presentationml.presentation", nil)
if err != nil {
log.Fatal(err)
}
fmt.Println(result.Metadata.FormatType())
Use advanced configuration
lang := "eng"
cfg := &v4.ExtractionConfig{
UseCache: true,
ForceOCR: false,
ImageExtraction: &v4.ImageExtractionConfig{Enabled: true},
OCR: &v4.OcrConfig{
Backend: "tesseract",
Language: &lang,
},
}
result, err := v4.ExtractFileSync("scanned.pdf", cfg)
Async (context-aware) extraction
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
result, err := v4.ExtractFile(ctx, "large.pdf", nil)
if err != nil {
log.Fatal(err)
}
fmt.Println("Content length:", len(result.Content))
Batch extract
paths := []string{"doc1.pdf", "doc2.docx", "report.xlsx"}
results, err := v4.BatchExtractFilesSync(paths, nil)
if err != nil {
log.Fatal(err)
}
for i, res := range results {
if res == nil {
continue
}
fmt.Printf("[%d] %s => %d bytes\n", i, res.MimeType, len(res.Content))
}
Register a validator
//export customValidator
func customValidator(resultJSON *C.char) *C.char {
// Validate JSON payload and return an error string (or NULL if ok)
return nil
}
func init() {
if err := v4.RegisterValidator("go-validator", 50, (C.ValidatorCallback)(C.customValidator)); err != nil {
log.Fatalf("validator registration failed: %v", err)
}
}
API Reference
- GoDoc: pkg.go.dev/github.com/kreuzberg-dev/kreuzberg/v5
- Full documentation: kreuzberg.dev (configuration, formats, OCR backends)
Troubleshooting
| Issue | Fix |
|---|---|
ld returned 1 exit status or undefined reference to 'html_to_markdown_...' |
The static library wasn't found. Make sure CGO_LDFLAGS points to the directory containing libkreuzberg_ffi.a: CGO_LDFLAGS="-L/path/to/lib -lkreuzberg_ffi" go build |
cannot find -lkreuzberg_ffi |
The static library file is missing or in the wrong location. Download it from GitHub Releases or build it yourself: cargo build -p kreuzberg-ffi --release |
undefined: v4.ExtractFile |
This function was removed in v4.1.0. Use ExtractFileSync and wrap in goroutine if needed (see migration guide) |
Missing dependency: tesseract |
Install the OCR backend and ensure it is on PATH. Errors bubble up as *v4.MissingDependencyError. |
undefined: C.customValidator during build |
Export the callback with //export in a *_cgo.go file before using it in Register* helpers. |
Missing dependency: onnxruntime |
Install ONNX Runtime at build time: brew install onnxruntime (macOS), apt install libonnxruntime libonnxruntime-dev (Linux), scoop install onnxruntime (Windows). Required for embeddings functionality. |
| Embeddings not available on Windows MinGW | Windows MinGW builds cannot link ONNX Runtime (MSVC-only). Use Windows MSVC build for embeddings support, or build without embeddings feature. |
Testing / Tooling
task go:lint– runsgofmtandgolangci-lint(golangci-lintpinned to v2.11.3).task go:test– executesgo test ./...(after building the static FFI library).task e2e:go:verify– regenerates fixtures via the e2e generator and runsgo test ./...insidee2e/go.
Need help? Join the Discord or open an issue with logs, platform info, and the steps you tried.
Part of Kreuzberg.dev
- Kreuzberg Cloud — managed extraction API with SDKs, dashboards, and observability.
- kreuzcrawl — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
- html-to-markdown — fast, lossless HTML→Markdown engine.
- liter-llm — universal LLM API client with native bindings for 14 languages and 143 providers.
- tree-sitter-language-pack — tree-sitter grammars and code-intelligence primitives.
- alef — the polyglot binding generator that produces this README and all per-language bindings.
- Discord — community, roadmap, announcements.