Nomad changes
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s

This commit is contained in:
Henrik Jess Nielsen
2026-06-01 23:40:55 +02:00
parent 72b1a0a6ed
commit b4c07d3693
5723 changed files with 1130655 additions and 0 deletions

118
packages/go/.golangci.yml Normal file
View File

@@ -0,0 +1,118 @@
version: "2"
run:
timeout: 5m
issues-exit-code: 1
tests: true
concurrency: 4
modules-download-mode: readonly
allow-serial-runners: false
allow-parallel-runners: true
linters:
default: none
enable:
- errcheck
- govet
- ineffassign
- staticcheck
- unused
- revive
- gocyclo
- goconst
- gocritic
- gosec
- misspell
- nakedret
settings:
errcheck:
check-type-assertions: true
check-blank: true
exclude-functions:
- (net/http.ResponseWriter).Write
- (io.Closer).Close
- fmt.Fprintf
- fmt.Printf
- fmt.Println
- os.Setenv
- os.Unsetenv
goconst:
min-len: 3
min-occurrences: 3
gocyclo:
min-complexity: 50
govet:
enable-all: true
disable:
- shadow
gocritic:
disabled-checks:
- dupSubExpr
misspell:
locale: US
nakedret:
max-func-lines: 30
revive:
confidence: 0.8
severity: warning
enable-all-rules: false
rules:
- name: blank-imports
- name: context-keys-type
- name: time-naming
- name: var-declaration
- name: unexported-return
- name: errorf
- name: context-as-argument
- name: dot-imports
- name: error-return
- name: error-strings
- name: error-naming
- name: if-return
- name: increment-decrement
- name: var-naming
- name: range
- name: receiver-naming
- name: indent-error-flow
- name: exported
disabled: true
- name: package-comments
disabled: true
exclusions:
generated: lax
rules:
- linters:
- goconst
path: _test\.go
- linters:
- gocyclo
path: _test\.go
- linters:
- gosec
path: _test\.go
- linters:
- revive
path: _test\.go
text: "context-as-argument"
- linters:
- govet
text: "fieldalignment:"
- linters:
- govet
text: "unsafeptr:"
paths:
- vendor
- build
- third_party$
issues:
max-issues-per-linter: 0
max-same-issues: 0
uniq-by-line: true
new: false
formatters:
exclusions:
generated: lax
paths:
- third_party$

View File

93
packages/go/LICENSE Normal file
View File

@@ -0,0 +1,93 @@
Elastic License 2.0 (ELv2)
Copyright 2025-2026 Kreuzberg, Inc.
Acceptance
By using the software, you agree to all of the terms and conditions below.
Copyright License
The licensor grants you a non-exclusive, royalty-free, worldwide,
non-sublicensable, non-transferable license to use, copy, distribute, make
available, and prepare derivative works of the software, in each case subject to
the limitations and conditions below.
Limitations
You may not provide the software to third parties as a hosted or managed
service, where the service provides users with access to any substantial set of
the features or functionality of the software.
You may not move, change, disable, or circumvent the license key functionality
in the software, and you may not remove or obscure any functionality in the
software that is protected by the license key.
You may not alter, remove, or obscure any licensing, copyright, or other notices
of the licensor in the software. Any use of the licensor's trademarks is subject
to applicable law.
Patents
The licensor grants you a license, under any patent claims the licensor can
license, or becomes able to license, to make, have made, use, sell, offer for
sale, import and have imported the software, in each case subject to the
limitations and conditions in this license. This license does not cover any
patent claims that you cause to be infringed by modifications or additions to the
software. If you or your company make any written claim that the software
infringes or contributes to infringement of any patent, your patent license for
the software granted under these terms ends immediately. If your company makes
such a claim, your patent license ends immediately for work on behalf of your
company.
Notices
You must ensure that anyone who gets a copy of any part of the software from you
also gets a copy of these terms.
If you modify the software, you must include in any modified copies of the
software prominent notices stating that you have modified the software.
No Other Rights
These terms do not imply any licenses other than those expressly granted in
these terms.
Termination
If you use the software in violation of these terms, such use is not licensed,
and your licenses will automatically terminate. If the licensor provides you with
a notice of your violation, and you cease all violation of this license no later
than 30 days after you receive that notice, your licenses will be reinstated
retroactively. However, if you violate these terms after such reinstatement, any
additional violation of these terms will cause your licenses to terminate
automatically and permanently.
No Liability
As far as the law allows, the software comes as is, without any warranty or
condition, and the licensor will not be liable to you for any damages arising out
of these terms or the use or nature of the software, under any kind of legal
claim.
Definitions
The licensor is the entity offering these terms, and the software is the
software the licensor makes available under these terms, including any portion
of it.
you refers to the individual or entity agreeing to these terms.
your company is any legal entity, sole proprietorship, or other kind of
organization that you work for, plus all organizations that have control over,
are under the control of, or are under common control with that organization.
control means ownership of substantially all the assets of an entity, or the
power to direct its management and policies by vote, contract, or otherwise.
Control can be direct or indirect.
your licenses are all the licenses granted to you for the software under these
terms.
use means anything you do with the software requiring one of your licenses.
trademark means trademarks, service marks, and similar rights.

3
packages/go/go.mod Normal file
View File

@@ -0,0 +1,3 @@
module github.com/kreuzberg-dev/kreuzberg/v5
go 1.26

341
packages/go/v4/README.md Normal file
View File

@@ -0,0 +1,341 @@
# Kreuzberg
<div align="center" style="display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;">
<a href="https://github.com/kreuzberg-dev/alef">
<img src="https://img.shields.io/badge/Bindings-alef%20%D7%90-007ec6" alt="Bindings">
</a>
<!-- Language Bindings -->
<a href="https://crates.io/crates/kreuzberg">
<img src="https://img.shields.io/crates/v/kreuzberg?label=Rust&color=007ec6" alt="Rust">
</a>
<a href="https://pypi.org/project/kreuzberg/">
<img src="https://img.shields.io/pypi/v/kreuzberg?label=Python&color=007ec6" alt="Python">
</a>
<a href="https://www.npmjs.com/package/@kreuzberg/node">
<img src="https://img.shields.io/npm/v/@kreuzberg/node?label=Node.js&color=007ec6" alt="Node.js">
</a>
<a href="https://www.npmjs.com/package/@kreuzberg/wasm">
<img src="https://img.shields.io/npm/v/@kreuzberg/wasm?label=WASM&color=007ec6" alt="WASM">
</a>
<a href="https://central.sonatype.com/artifact/dev.kreuzberg/kreuzberg">
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg?label=Java&color=007ec6" alt="Java">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/go/v4">
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v5*" alt="Go">
</a>
<a href="https://www.nuget.org/packages/Kreuzberg/">
<img src="https://img.shields.io/nuget/v/Kreuzberg?label=C%23&color=007ec6" alt="C#">
</a>
<a href="https://packagist.org/packages/kreuzberg/kreuzberg">
<img src="https://img.shields.io/packagist/v/kreuzberg/kreuzberg?label=PHP&color=007ec6" alt="PHP">
</a>
<a href="https://rubygems.org/gems/kreuzberg">
<img src="https://img.shields.io/gem/v/kreuzberg?label=Ruby&color=007ec6" alt="Ruby">
</a>
<a href="https://hex.pm/packages/kreuzberg">
<img src="https://img.shields.io/hexpm/v/kreuzberg?label=Elixir&color=007ec6" alt="Elixir">
</a>
<a href="https://kreuzberg-dev.r-universe.dev/kreuzberg">
<img src="https://img.shields.io/badge/R-kreuzberg-007ec6" alt="R">
</a>
<a href="https://pub.dev/packages/kreuzberg">
<img src="https://img.shields.io/pub/v/kreuzberg?label=Dart&color=007ec6" alt="Dart">
</a>
<a href="https://central.sonatype.com/artifact/dev.kreuzberg/kreuzberg-android">
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg-android?label=Kotlin&color=007ec6" alt="Kotlin">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/swift">
<img src="https://img.shields.io/badge/Swift-SPM-007ec6" alt="Swift">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/zig">
<img src="https://img.shields.io/badge/Zig-package-007ec6" alt="Zig">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/releases">
<img src="https://img.shields.io/badge/C-FFI-007ec6" alt="C FFI">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/pkgs/container/kreuzberg">
<img src="https://img.shields.io/badge/Docker-ghcr.io-007ec6?logo=docker&logoColor=white" alt="Docker">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/pkgs/container/charts%2Fkreuzberg">
<img src="https://img.shields.io/badge/Helm-ghcr.io-007ec6?logo=helm&logoColor=white" alt="Helm">
</a>
<!-- Project Info -->
<a href="https://github.com/kreuzberg-dev/kreuzberg/blob/main/LICENSE">
<img src="https://img.shields.io/badge/License-Elastic--2.0-007ec6" alt="License">
</a>
<a href="https://docs.kreuzberg.dev">
<img src="https://img.shields.io/badge/Docs-kreuzberg-007ec6" alt="Documentation">
</a>
<a href="https://huggingface.co/Kreuzberg">
<img src="https://img.shields.io/badge/Hugging%20Face-Kreuzberg-007ec6" alt="Hugging Face">
</a>
</div>
<div align="center" style="margin: 24px 0 0;">
<a href="https://kreuzberg.dev">
<img alt="Kreuzberg" src="https://github.com/user-attachments/assets/419fc06c-8313-4324-b159-4b4d3cfce5c0" />
</a>
</div>
<div align="center" style="display: flex; flex-wrap: wrap; gap: 12px; justify-content: center; margin: 28px 0 24px;">
<a href="https://discord.gg/xt9WY3GnKR">
<img height="22" src="https://img.shields.io/badge/Discord-Chat-007ec6?logo=discord&logoColor=white" alt="Join Discord">
</a>
<a href="https://docs.kreuzberg.dev/demo.html">
<img height="22" src="https://img.shields.io/badge/Live%20Demo-Open-007ec6?logo=webassembly&logoColor=white" alt="Live Demo">
</a>
</div>
High-performance document intelligence for Go backed by the Rust core that powers every Kreuzberg binding.
> **Version 4.10.0-rc.15**
> Report issues at [github.com/kreuzberg-dev/kreuzberg](https://github.com/kreuzberg-dev/kreuzberg/issues).
## Install
Kreuzberg Go binaries are **statically linked** — once built, they are self-contained and require no runtime library dependencies. Only the static library is needed at build time.
### Quick Start (Monorepo Development)
For development in the Kreuzberg monorepo:
```bash
# Build the static FFI library
cargo build -p kreuzberg-ffi --release
# Go build will automatically link against the static library
# (from target/release/libkreuzberg_ffi.a)
cd packages/go/v5
go build -v
# Run your binary (no library path needed - it's statically linked)
./v4
```
That's it! The resulting binary is self-contained and has no runtime dependencies on Kreuzberg libraries.
### Using Go Modules
To use this package via `go get`:
```bash
# Get the latest release
go get github.com/kreuzberg-dev/kreuzberg/v4@latest
# Or a specific version
go get github.com/kreuzberg-dev/kreuzberg/v4@v4.10.0-rc.15
```
You'll need to provide the static library at build time. See [Building with Static Libraries](#building-with-static-libraries) below.
### Building with Static Libraries
When building outside the Kreuzberg monorepo, you need to provide the static library (`.a` file on Unix, `.lib` on Windows).
#### Option 1: Download Pre-built Static Library
Download the static library for your platform from [GitHub Releases](https://github.com/kreuzberg-dev/kreuzberg/releases):
```bash
# Example: Linux x86_64
curl -LO https://github.com/kreuzberg-dev/kreuzberg/releases/download/v4.10.0-rc.15/go-ffi-linux-x86_64.tar.gz
tar -xzf go-ffi-linux-x86_64.tar.gz
# Copy to a permanent location
mkdir -p ~/kreuzberg/lib
cp kreuzberg-ffi/lib/libkreuzberg_ffi.a ~/kreuzberg/lib/
```
Then build with `CGO_LDFLAGS`:
```bash
# Linux/macOS
CGO_LDFLAGS="-L$HOME/kreuzberg/lib -lkreuzberg_ffi" go build
# Windows (MSVC)
set CGO_LDFLAGS=-L%USERPROFILE%\kreuzberg\lib -lkreuzberg_ffi
go build
```
#### Option 2: Build Static Library Yourself
If pre-built libraries aren't available for your platform:
```bash
# Clone the repository
git clone https://github.com/kreuzberg-dev/kreuzberg.git
cd kreuzberg
# Build the static library
cargo build -p kreuzberg-ffi --release
# The static library is now at: target/release/libkreuzberg_ffi.a
# Copy it to a permanent location
mkdir -p ~/kreuzberg/lib
cp target/release/libkreuzberg_ffi.a ~/kreuzberg/lib/
# Now you can build Go projects
cd ~/my-go-project
CGO_LDFLAGS="-L$HOME/kreuzberg/lib -lkreuzberg_ffi" go build
```
### System Requirements
#### ONNX Runtime (for embeddings)
If using embeddings functionality, ONNX Runtime must be installed **at build time**:
```bash
# macOS
brew install onnxruntime
# Ubuntu/Debian
sudo apt install libonnxruntime libonnxruntime-dev
# Windows (MSVC)
scoop install onnxruntime
# OR download from https://github.com/microsoft/onnxruntime/releases
```
The resulting binary will have ONNX Runtime statically linked or dynamically linked depending on how the FFI library was built. Check the build configuration.
**Note:** Windows MinGW builds do not support embeddings (ONNX Runtime requires MSVC). Use Windows MSVC for embeddings support.
## Quickstart
```go
package main
import (
"fmt"
"log"
"github.com/kreuzberg-dev/kreuzberg/v4"
)
func main() {
result, err := v4.ExtractFileSync("document.pdf", nil)
if err != nil {
log.Fatalf("extract failed: %v", err)
}
fmt.Println("MIME:", result.MimeType)
fmt.Println("First 200 chars:")
fmt.Println(result.Content[:200])
}
```
Build and run:
```bash
# Build (make sure you have the static library available - see Install)
CGO_LDFLAGS="-L$HOME/kreuzberg/lib -lkreuzberg_ffi" go build
# Run - no library paths needed!
./myapp
```
The binary is self-contained and can be distributed without any Kreuzberg library dependencies.
## Examples
### Extract bytes
```go
data, err := os.ReadFile("slides.pptx")
if err != nil {
log.Fatal(err)
}
result, err := v4.ExtractBytesSync(data, "application/vnd.openxmlformats-officedocument.presentationml.presentation", nil)
if err != nil {
log.Fatal(err)
}
fmt.Println(result.Metadata.FormatType())
```
### Use advanced configuration
```go
lang := "eng"
cfg := &v4.ExtractionConfig{
UseCache: true,
ForceOCR: false,
ImageExtraction: &v4.ImageExtractionConfig{Enabled: true},
OCR: &v4.OcrConfig{
Backend: "tesseract",
Language: &lang,
},
}
result, err := v4.ExtractFileSync("scanned.pdf", cfg)
```
### Async (context-aware) extraction
```go
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
result, err := v4.ExtractFile(ctx, "large.pdf", nil)
if err != nil {
log.Fatal(err)
}
fmt.Println("Content length:", len(result.Content))
```
### Batch extract
```go
paths := []string{"doc1.pdf", "doc2.docx", "report.xlsx"}
results, err := v4.BatchExtractFilesSync(paths, nil)
if err != nil {
log.Fatal(err)
}
for i, res := range results {
if res == nil {
continue
}
fmt.Printf("[%d] %s => %d bytes\n", i, res.MimeType, len(res.Content))
}
```
### Register a validator
```go
//export customValidator
func customValidator(resultJSON *C.char) *C.char {
// Validate JSON payload and return an error string (or NULL if ok)
return nil
}
func init() {
if err := v4.RegisterValidator("go-validator", 50, (C.ValidatorCallback)(C.customValidator)); err != nil {
log.Fatalf("validator registration failed: %v", err)
}
}
```
## API Reference
- **Source**: [packages/go/v4](https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/go/v4)
- **Full documentation**: [kreuzberg.dev](https://kreuzberg.dev) (configuration, formats, OCR backends)
## Troubleshooting
| Issue | Fix |
| ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ld returned 1 exit status` or `undefined reference to 'html_to_markdown_...'` | The static library wasn't found. Make sure `CGO_LDFLAGS` points to the directory containing `libkreuzberg_ffi.a`: `CGO_LDFLAGS="-L/path/to/lib -lkreuzberg_ffi" go build` |
| `cannot find -lkreuzberg_ffi` | The static library file is missing or in the wrong location. Download it from [GitHub Releases](https://github.com/kreuzberg-dev/kreuzberg/releases) or build it yourself: `cargo build -p kreuzberg-ffi --release` |
| `undefined: v4.ExtractFile` | This function was removed in v4.1.0. Use `ExtractFileSync` and wrap in goroutine if needed (see migration guide) |
| `Missing dependency: tesseract` | Install the OCR backend and ensure it is on `PATH`. Errors bubble up as `*v4.MissingDependencyError`. |
| `undefined: C.customValidator` during build | Export the callback with `//export` in a `*_cgo.go` file before using it in `Register*` helpers. |
| `Missing dependency: onnxruntime` | Install ONNX Runtime at build time: `brew install onnxruntime` (macOS), `apt install libonnxruntime libonnxruntime-dev` (Linux), `scoop install onnxruntime` (Windows). Required for embeddings functionality. |
| Embeddings not available on Windows MinGW | Windows MinGW builds cannot link ONNX Runtime (MSVC-only). Use Windows MSVC build for embeddings support, or build without embeddings feature. |
## Testing / Tooling
- `task go:lint` runs `gofmt` and `golangci-lint` (`golangci-lint` pinned to v2.11.3).
- `task go:test` executes `go test ./...` (after building the static FFI library).
- `task e2e:go:verify` regenerates fixtures via the e2e generator and runs `go test ./...` inside `e2e/go`.
Need help? Join the [Discord](https://discord.gg/xt9WY3GnKR) or open an issue with logs, platform info, and the steps you tried.

3
packages/go/v4/go.mod Normal file
View File

@@ -0,0 +1,3 @@
module github.com/kreuzberg-dev/kreuzberg/v4
go 1.26

93
packages/go/v5/LICENSE generated Normal file
View File

@@ -0,0 +1,93 @@
Elastic License 2.0 (ELv2)
Copyright 2025-2026 Kreuzberg, Inc.
Acceptance
By using the software, you agree to all of the terms and conditions below.
Copyright License
The licensor grants you a non-exclusive, royalty-free, worldwide,
non-sublicensable, non-transferable license to use, copy, distribute, make
available, and prepare derivative works of the software, in each case subject to
the limitations and conditions below.
Limitations
You may not provide the software to third parties as a hosted or managed
service, where the service provides users with access to any substantial set of
the features or functionality of the software.
You may not move, change, disable, or circumvent the license key functionality
in the software, and you may not remove or obscure any functionality in the
software that is protected by the license key.
You may not alter, remove, or obscure any licensing, copyright, or other notices
of the licensor in the software. Any use of the licensor's trademarks is subject
to applicable law.
Patents
The licensor grants you a license, under any patent claims the licensor can
license, or becomes able to license, to make, have made, use, sell, offer for
sale, import and have imported the software, in each case subject to the
limitations and conditions in this license. This license does not cover any
patent claims that you cause to be infringed by modifications or additions to the
software. If you or your company make any written claim that the software
infringes or contributes to infringement of any patent, your patent license for
the software granted under these terms ends immediately. If your company makes
such a claim, your patent license ends immediately for work on behalf of your
company.
Notices
You must ensure that anyone who gets a copy of any part of the software from you
also gets a copy of these terms.
If you modify the software, you must include in any modified copies of the
software prominent notices stating that you have modified the software.
No Other Rights
These terms do not imply any licenses other than those expressly granted in
these terms.
Termination
If you use the software in violation of these terms, such use is not licensed,
and your licenses will automatically terminate. If the licensor provides you with
a notice of your violation, and you cease all violation of this license no later
than 30 days after you receive that notice, your licenses will be reinstated
retroactively. However, if you violate these terms after such reinstatement, any
additional violation of these terms will cause your licenses to terminate
automatically and permanently.
No Liability
As far as the law allows, the software comes as is, without any warranty or
condition, and the licensor will not be liable to you for any damages arising out
of these terms or the use or nature of the software, under any kind of legal
claim.
Definitions
The licensor is the entity offering these terms, and the software is the
software the licensor makes available under these terms, including any portion
of it.
you refers to the individual or entity agreeing to these terms.
your company is any legal entity, sole proprietorship, or other kind of
organization that you work for, plus all organizations that have control over,
are under the control of, or are under common control with that organization.
control means ownership of substantially all the assets of an entity, or the
power to direct its management and policies by vote, contract, or otherwise.
Control can be direct or indirect.
your licenses are all the licenses granted to you for the software under these
terms.
use means anything you do with the software requiring one of your licenses.
trademark means trademarks, service marks, and similar rights.

358
packages/go/v5/README.md generated Normal file
View File

@@ -0,0 +1,358 @@
# Kreuzberg
<div align="center" style="display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;">
<a href="https://github.com/kreuzberg-dev/alef">
<img src="https://img.shields.io/badge/Bindings-alef%20%D7%90-007ec6" alt="Bindings">
</a>
<!-- Language Bindings -->
<a href="https://crates.io/crates/kreuzberg">
<img src="https://img.shields.io/crates/v/kreuzberg?label=Rust&color=007ec6" alt="Rust">
</a>
<a href="https://pypi.org/project/kreuzberg/">
<img src="https://img.shields.io/pypi/v/kreuzberg?label=Python&color=007ec6" alt="Python">
</a>
<a href="https://www.npmjs.com/package/@kreuzberg/node">
<img src="https://img.shields.io/npm/v/@kreuzberg/node?label=Node.js&color=007ec6" alt="Node.js">
</a>
<a href="https://www.npmjs.com/package/@kreuzberg/wasm">
<img src="https://img.shields.io/npm/v/@kreuzberg/wasm?label=WASM&color=007ec6" alt="WASM">
</a>
<a href="https://central.sonatype.com/artifact/dev.kreuzberg/kreuzberg">
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg?label=Java&color=007ec6" alt="Java">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/go/v5">
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v5*" alt="Go">
</a>
<a href="https://www.nuget.org/packages/Kreuzberg/">
<img src="https://img.shields.io/nuget/v/Kreuzberg?label=C%23&color=007ec6" alt="C#">
</a>
<a href="https://packagist.org/packages/kreuzberg/kreuzberg">
<img src="https://img.shields.io/packagist/v/kreuzberg/kreuzberg?label=PHP&color=007ec6" alt="PHP">
</a>
<a href="https://rubygems.org/gems/kreuzberg">
<img src="https://img.shields.io/gem/v/kreuzberg?label=Ruby&color=007ec6" alt="Ruby">
</a>
<a href="https://hex.pm/packages/kreuzberg">
<img src="https://img.shields.io/hexpm/v/kreuzberg?label=Elixir&color=007ec6" alt="Elixir">
</a>
<a href="https://kreuzberg-dev.r-universe.dev/kreuzberg">
<img src="https://img.shields.io/badge/R-kreuzberg-007ec6" alt="R">
</a>
<a href="https://pub.dev/packages/kreuzberg">
<img src="https://img.shields.io/pub/v/kreuzberg?label=Dart&color=007ec6" alt="Dart">
</a>
<a href="https://central.sonatype.com/artifact/dev.kreuzberg/kreuzberg-android">
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg-android?label=Kotlin&color=007ec6" alt="Kotlin">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/swift">
<img src="https://img.shields.io/badge/Swift-SPM-007ec6" alt="Swift">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/zig">
<img src="https://img.shields.io/badge/Zig-package-007ec6" alt="Zig">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/releases">
<img src="https://img.shields.io/badge/C-FFI-007ec6" alt="C FFI">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/pkgs/container/kreuzberg">
<img src="https://img.shields.io/badge/Docker-ghcr.io-007ec6?logo=docker&logoColor=white" alt="Docker">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/pkgs/container/charts%2Fkreuzberg">
<img src="https://img.shields.io/badge/Helm-ghcr.io-007ec6?logo=helm&logoColor=white" alt="Helm">
</a>
<!-- Project Info -->
<a href="https://github.com/kreuzberg-dev/kreuzberg/blob/main/LICENSE">
<img src="https://img.shields.io/badge/License-Elastic--2.0-007ec6" alt="License">
</a>
<a href="https://docs.kreuzberg.dev">
<img src="https://img.shields.io/badge/Docs-kreuzberg-007ec6" alt="Documentation">
</a>
<a href="https://huggingface.co/Kreuzberg">
<img src="https://img.shields.io/badge/Hugging%20Face-Kreuzberg-007ec6" alt="Hugging Face">
</a>
</div>
<div align="center" style="margin: 24px 0 0;">
<a href="https://kreuzberg.dev">
<img alt="Kreuzberg" src="https://github.com/user-attachments/assets/419fc06c-8313-4324-b159-4b4d3cfce5c0" />
</a>
</div>
<div align="center" style="display: flex; flex-wrap: wrap; gap: 12px; justify-content: center; margin: 28px 0 24px;">
<a href="https://discord.gg/xt9WY3GnKR">
<img height="22" src="https://img.shields.io/badge/Discord-Chat-007ec6?logo=discord&logoColor=white" alt="Join Discord">
</a>
<a href="https://docs.kreuzberg.dev/demo.html">
<img height="22" src="https://img.shields.io/badge/Live%20Demo-Open-007ec6?logo=webassembly&logoColor=white" alt="Live Demo">
</a>
</div>
High-performance document intelligence for Go backed by the Rust core that powers every Kreuzberg binding.
> **Version 5.0.0-rc.3**
> Report issues at [github.com/kreuzberg-dev/kreuzberg](https://github.com/kreuzberg-dev/kreuzberg/issues).
## What This Package Provides
- **Go module over the Rust core** — context-aware extraction with Go structs and errors.
- **Structured results** — text, tables, images, metadata, language detection, chunks, and warnings.
- **Static-link workflow** — build against `kreuzberg-ffi` and ship a self-contained Go binary.
- **Cross-binding parity** — output matches the Python, Node.js, Ruby, Java, .NET, PHP, Elixir, R, Dart, Swift, Zig, WASM, and C FFI packages.
## Install
Kreuzberg Go binaries are **statically linked** — once built, they are self-contained and require no runtime library dependencies. Only the static library is needed at build time.
### Quick Start (Monorepo Development)
For development in the Kreuzberg monorepo:
```bash
# Build the static FFI library
cargo build -p kreuzberg-ffi --release
# Go build will automatically link against the static library
# (from target/release/libkreuzberg_ffi.a)
cd packages/go/v5
go build -v
# Run your binary (no library path needed - it's statically linked)
./v4
```
That's it! The resulting binary is self-contained and has no runtime dependencies on Kreuzberg libraries.
### Using Go Modules
To use this package via `go get`:
```bash
# Get the latest release
go get github.com/kreuzberg-dev/kreuzberg/v5@latest
# Or a specific version
go get github.com/kreuzberg-dev/kreuzberg/v5@v5.0.0-rc.3
```
You'll need to provide the static library at build time. See [Building with Static Libraries](#building-with-static-libraries) below.
### Building with Static Libraries
When building outside the Kreuzberg monorepo, you need to provide the static library (`.a` file on Unix, `.lib` on Windows).
#### Option 1: Download Pre-built Static Library
Download the static library for your platform from [GitHub Releases](https://github.com/kreuzberg-dev/kreuzberg/releases):
```bash
# Example: Linux x86_64
curl -LO https://github.com/kreuzberg-dev/kreuzberg/releases/download/v5.0.0-rc.3/go-ffi-linux-x86_64.tar.gz
tar -xzf go-ffi-linux-x86_64.tar.gz
# Copy to a permanent location
mkdir -p ~/kreuzberg/lib
cp kreuzberg-ffi/lib/libkreuzberg_ffi.a ~/kreuzberg/lib/
```
Then build with `CGO_LDFLAGS`:
```bash
# Linux/macOS
CGO_LDFLAGS="-L$HOME/kreuzberg/lib -lkreuzberg_ffi" go build
# Windows (MSVC)
set CGO_LDFLAGS=-L%USERPROFILE%\kreuzberg\lib -lkreuzberg_ffi
go build
```
#### Option 2: Build Static Library Yourself
If pre-built libraries aren't available for your platform:
```bash
# Clone the repository
git clone https://github.com/kreuzberg-dev/kreuzberg.git
cd kreuzberg
# Build the static library
cargo build -p kreuzberg-ffi --release
# The static library is now at: target/release/libkreuzberg_ffi.a
# Copy it to a permanent location
mkdir -p ~/kreuzberg/lib
cp target/release/libkreuzberg_ffi.a ~/kreuzberg/lib/
# Now you can build Go projects
cd ~/my-go-project
CGO_LDFLAGS="-L$HOME/kreuzberg/lib -lkreuzberg_ffi" go build
```
### System Requirements
#### ONNX Runtime (for embeddings)
If using embeddings functionality, ONNX Runtime must be installed **at build time**:
```bash
# macOS
brew install onnxruntime
# Ubuntu/Debian
sudo apt install libonnxruntime libonnxruntime-dev
# Windows (MSVC)
scoop install onnxruntime
# OR download from https://github.com/microsoft/onnxruntime/releases
```
The resulting binary will have ONNX Runtime statically linked or dynamically linked depending on how the FFI library was built. Check the build configuration.
**Note:** Windows MinGW builds do not support embeddings (ONNX Runtime requires MSVC). Use Windows MSVC for embeddings support.
## Quickstart
```go
package main
import (
"fmt"
"log"
"github.com/kreuzberg-dev/kreuzberg/v5"
)
func main() {
result, err := v4.ExtractFileSync("document.pdf", nil)
if err != nil {
log.Fatalf("extract failed: %v", err)
}
fmt.Println("MIME:", result.MimeType)
fmt.Println("First 200 chars:")
fmt.Println(result.Content[:200])
}
```
Build and run:
```bash
# Build (make sure you have the static library available - see Install)
CGO_LDFLAGS="-L$HOME/kreuzberg/lib -lkreuzberg_ffi" go build
# Run - no library paths needed!
./myapp
```
The binary is self-contained and can be distributed without any Kreuzberg library dependencies.
## Examples
### Extract bytes
```go
data, err := os.ReadFile("slides.pptx")
if err != nil {
log.Fatal(err)
}
result, err := v4.ExtractBytesSync(data, "application/vnd.openxmlformats-officedocument.presentationml.presentation", nil)
if err != nil {
log.Fatal(err)
}
fmt.Println(result.Metadata.FormatType())
```
### Use advanced configuration
```go
lang := "eng"
cfg := &v4.ExtractionConfig{
UseCache: true,
ForceOCR: false,
ImageExtraction: &v4.ImageExtractionConfig{Enabled: true},
OCR: &v4.OcrConfig{
Backend: "tesseract",
Language: &lang,
},
}
result, err := v4.ExtractFileSync("scanned.pdf", cfg)
```
### Async (context-aware) extraction
```go
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
result, err := v4.ExtractFile(ctx, "large.pdf", nil)
if err != nil {
log.Fatal(err)
}
fmt.Println("Content length:", len(result.Content))
```
### Batch extract
```go
paths := []string{"doc1.pdf", "doc2.docx", "report.xlsx"}
results, err := v4.BatchExtractFilesSync(paths, nil)
if err != nil {
log.Fatal(err)
}
for i, res := range results {
if res == nil {
continue
}
fmt.Printf("[%d] %s => %d bytes\n", i, res.MimeType, len(res.Content))
}
```
### Register a validator
```go
//export customValidator
func customValidator(resultJSON *C.char) *C.char {
// Validate JSON payload and return an error string (or NULL if ok)
return nil
}
func init() {
if err := v4.RegisterValidator("go-validator", 50, (C.ValidatorCallback)(C.customValidator)); err != nil {
log.Fatalf("validator registration failed: %v", err)
}
}
```
## API Reference
- **GoDoc**: [pkg.go.dev/github.com/kreuzberg-dev/kreuzberg/v5](<https://pkg.go.dev/github.com/kreuzberg-dev/kreuzberg/v5>)
- **Full documentation**: [kreuzberg.dev](https://kreuzberg.dev) (configuration, formats, OCR backends)
## Troubleshooting
| Issue | Fix |
| ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ld returned 1 exit status` or `undefined reference to 'html_to_markdown_...'` | The static library wasn't found. Make sure `CGO_LDFLAGS` points to the directory containing `libkreuzberg_ffi.a`: `CGO_LDFLAGS="-L/path/to/lib -lkreuzberg_ffi" go build` |
| `cannot find -lkreuzberg_ffi` | The static library file is missing or in the wrong location. Download it from [GitHub Releases](https://github.com/kreuzberg-dev/kreuzberg/releases) or build it yourself: `cargo build -p kreuzberg-ffi --release` |
| `undefined: v4.ExtractFile` | This function was removed in v4.1.0. Use `ExtractFileSync` and wrap in goroutine if needed (see migration guide) |
| `Missing dependency: tesseract` | Install the OCR backend and ensure it is on `PATH`. Errors bubble up as `*v4.MissingDependencyError`. |
| `undefined: C.customValidator` during build | Export the callback with `//export` in a `*_cgo.go` file before using it in `Register*` helpers. |
| `Missing dependency: onnxruntime` | Install ONNX Runtime at build time: `brew install onnxruntime` (macOS), `apt install libonnxruntime libonnxruntime-dev` (Linux), `scoop install onnxruntime` (Windows). Required for embeddings functionality. |
| Embeddings not available on Windows MinGW | Windows MinGW builds cannot link ONNX Runtime (MSVC-only). Use Windows MSVC build for embeddings support, or build without embeddings feature. |
## Testing / Tooling
- `task go:lint` runs `gofmt` and `golangci-lint` (`golangci-lint` pinned to v2.11.3).
- `task go:test` executes `go test ./...` (after building the static FFI library).
- `task e2e:go:verify` regenerates fixtures via the e2e generator and runs `go test ./...` inside `e2e/go`.
Need help? Join the [Discord](https://discord.gg/xt9WY3GnKR) or open an issue with logs, platform info, and the steps you tried.
## Part of Kreuzberg.dev
- [Kreuzberg Cloud](https://github.com/kreuzberg-dev/kreuzberg-cloud) — managed extraction API with SDKs, dashboards, and observability.
- [kreuzcrawl](https://github.com/kreuzberg-dev/kreuzcrawl) — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
- [html-to-markdown](https://github.com/kreuzberg-dev/html-to-markdown) — fast, lossless HTML→Markdown engine.
- [liter-llm](https://github.com/kreuzberg-dev/liter-llm) — universal LLM API client with native bindings for 14 languages and 143 providers.
- [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — tree-sitter grammars and code-intelligence primitives.
- [alef](https://github.com/kreuzberg-dev/alef) — the polyglot binding generator that produces this README and all per-language bindings.
- [Discord](https://discord.gg/xt9WY3GnKR) — community, roadmap, announcements.

7847
packages/go/v5/binding.go generated Normal file

File diff suppressed because it is too large Load Diff

184
packages/go/v5/cmd/download_ffi/main.go generated Normal file
View File

@@ -0,0 +1,184 @@
// Tool to download platform-specific FFI libraries from GitHub releases.
// Invoked via `go generate` before compilation.
//go:build ignore
// +build ignore
package main
import (
"archive/tar"
"compress/gzip"
"fmt"
"io"
"net/http"
"os"
"path/filepath"
"runtime"
"strings"
)
const (
moduleVersion = "5.0.0-rc.3"
repoURL = "https://github.com/kreuzberg-dev/kreuzberg"
assetPrefix = "kreuzberg"
)
func main() {
if err := run(); err != nil {
fmt.Fprintf(os.Stderr, "Error: %v\n", err)
os.Exit(1)
}
}
func run() error {
// Determine cache and library paths
cacheBase, libPath, err := determinePaths()
if err != nil {
return err
}
// Check if library already exists
if _, err := os.Stat(libPath); err == nil {
// Library already cached, nothing to do
return nil
}
// Download the FFI library tarball
if err := downloadAndExtractLibrary(cacheBase); err != nil {
return fmt.Errorf("failed to download FFI library: %w", err)
}
return nil
}
func determinePaths() (string, string, error) {
goos := runtime.GOOS
goarch := runtime.GOARCH
// Map Go platform names to asset names
osName := goos
if goos == "darwin" {
osName = "macos"
}
libName := "kreuzberg_ffi"
// Use the current working directory as the module root (where this script is run from)
// and place libraries in .lib/{os}-{arch}/
moduleRoot, err := os.Getwd()
if err != nil {
return "", "", fmt.Errorf("cannot determine module root: %w", err)
}
libDir := filepath.Join(moduleRoot, ".lib", fmt.Sprintf("%s-%s", osName, goarch))
libPath := filepath.Join(libDir, libFilename(libName, goos))
return libDir, libPath, nil
}
func libFilename(libName, goos string) string {
switch goos {
case "windows":
return libName + ".dll"
case "darwin":
return "lib" + libName + ".dylib"
default:
return "lib" + libName + ".so"
}
}
func downloadAndExtractLibrary(cacheDir string) error {
goos := runtime.GOOS
goarch := runtime.GOARCH
osName := goos
if goos == "darwin" {
osName = "macos"
}
// Map Go arch names to the alef platform names used in release asset filenames.
// The local .lib/<os>-<goarch>/ directories use Go arch names (matching cgo LDFLAGS),
// but alef's packager emits tarballs with its own arch names: x86_64, aarch64.
archName := goarch
switch goarch {
case "amd64":
archName = "x86_64"
case "arm64":
// macOS arm64 stays "arm64" (alef go_java_platform special-cases it);
// all other platforms use "aarch64".
if goos != "darwin" {
archName = "aarch64"
}
}
// Clean version for asset name
version := strings.TrimPrefix(moduleVersion, "v")
assetName := fmt.Sprintf("%s-go-v%s-%s-%s.tar.gz", assetPrefix, version, osName, archName)
downloadURL := fmt.Sprintf("%s/releases/download/v%s/%s", repoURL, version, assetName)
// Create cache directory
if err := os.MkdirAll(cacheDir, 0755); err != nil {
return fmt.Errorf("mkdir cache: %w", err)
}
// Download tarball
resp, err := http.Get(downloadURL)
if err != nil {
return fmt.Errorf("download %s: %w", downloadURL, err)
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
body, _ := io.ReadAll(resp.Body)
return fmt.Errorf("HTTP %d: %s", resp.StatusCode, string(body))
}
// Extract tarball to cache directory
if err := extractTarGz(resp.Body, cacheDir); err != nil {
return fmt.Errorf("extract tarball: %w", err)
}
return nil
}
func extractTarGz(src io.Reader, dstDir string) error {
gzr, err := gzip.NewReader(src)
if err != nil {
return err
}
defer gzr.Close()
tr := tar.NewReader(gzr)
for {
header, err := tr.Next()
if err == io.EOF {
break
}
if err != nil {
return err
}
// Construct destination path, stripping any leading directory
// from the tarball (e.g., "staging/lib..." -> "lib...")
targetPath := filepath.Join(dstDir, filepath.Base(header.Name))
switch header.Typeflag {
case tar.TypeReg:
f, err := os.OpenFile(targetPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.FileMode(header.Mode))
if err != nil {
return err
}
if _, err := io.Copy(f, tr); err != nil {
f.Close()
return err
}
f.Close()
case tar.TypeDir:
if err := os.MkdirAll(targetPath, os.FileMode(header.Mode)); err != nil {
return err
}
}
}
return nil
}

18
packages/go/v5/embed_ffi.go generated Normal file
View File

@@ -0,0 +1,18 @@
//go:build ignore
// +build ignore
package kreuzberg
import "embed"
// This file ensures that FFI header files and library artifacts are included
// when this module is vendored. The //go:embed directive tells Go to include
// the include/ and .lib/ directories when running `go mod vendor`.
//
// This file itself is not compiled (//go:build ignore), but its directives are
// processed by Go's module system to ensure all necessary files are present in
// vendored installations.
//go:embed include/*
//go:embed .lib/*
var _ embed.FS

7
packages/go/v5/generate.go generated Normal file
View File

@@ -0,0 +1,7 @@
//go:generate go run ./cmd/download_ffi
//go:build ignore
// This file's sole purpose is to hold the go:generate directive that downloads
// the platform-specific FFI library from GitHub releases. This file is not compiled
// but its directives are processed by `go generate`.
package main

3
packages/go/v5/go.mod generated Normal file
View File

@@ -0,0 +1,3 @@
module github.com/kreuzberg-dev/kreuzberg/v5
go 1.26

13439
packages/go/v5/include/kreuzberg.h generated Normal file

File diff suppressed because it is too large Load Diff

1987
packages/go/v5/trait_bridges.go generated Normal file

File diff suppressed because it is too large Load Diff