Files
fil/docker/README.md
Henrik Jess Nielsen b4c07d3693
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s
Nomad changes
2026-06-01 23:40:55 +02:00

3.9 KiB

Kreuzberg Docker Images

This directory contains Dockerfile variants for building Kreuzberg Docker images with different feature sets.

Base Image

Both variants use Debian 13 (Trixie) slim - the latest stable Debian release for optimal package availability and security updates.

Image Variants

1. Core Image (Dockerfile.core)

Size: ~1.0-1.3GB Base: debian:trixie-slim Features: PDF, DOCX, PPTX, images, HTML, XML, text, Excel, email, academic formats (LaTeX, EPUB, etc.) OCR: Tesseract (12 languages) Legacy Office: Native OLE/CFB parsing support

When to use:

  • Production deployments where image size matters
  • Cloud environments with size/bandwidth constraints
  • Kubernetes deployments with frequent pod scaling
  • All use cases (both images have equivalent legacy Office support)

Build command:

docker build -f docker/Dockerfile.core -t kreuzberg:core .

2. Full Image (Dockerfile.full)

Size: ~1.0-1.3GB Base: debian:trixie-slim Features: All core features with native legacy Office format support OCR: Tesseract (12 languages) Legacy Office: Native OLE/CFB parsing for .doc, .ppt, .xls

When to use:

  • Complete document intelligence pipeline with all optional dependencies
  • Development and testing environments
  • When you want maximum feature completeness

Build command:

docker build -f docker/Dockerfile.full -t kreuzberg:full .

Size Comparison

Component Core Full Difference
Base (trixie-slim) ~120MB ~120MB -
Tesseract + 12 langs ~250MB ~250MB -
Rust binary ~80MB ~80MB -
System libraries ~100MB ~100MB -
Total (approx) ~1.0-1.3GB ~1.0-1.3GB - (same size)

Default Image

The root Dockerfile is a symlink to Dockerfile.full for backward compatibility and complete feature support by default.

Multi-Architecture Support

Both images support:

  • linux/amd64 (x86_64)
  • linux/arm64 (aarch64)

Both architectures use the same pure-Rust PDF library — no architecture-specific binaries needed.

Usage Modes

All images support three execution modes via ENTRYPOINT:

1. API Server (default)

docker run -p 8000:8000 kreuzberg:core
# or override host/port:
docker run -p 8000:8000 kreuzberg:core serve --host 0.0.0.0 --port 8000

2. CLI Mode

docker run -v $(pwd):/data kreuzberg:core extract /data/document.pdf
docker run -v $(pwd):/data kreuzberg:core detect /data/file.bin
docker run -v $(pwd):/data kreuzberg:core batch /data/*.pdf

3. MCP Server Mode

docker run kreuzberg:core mcp

Testing

Test scripts are provided to verify both image variants:

# Test core image
IMAGE_NAME=kreuzberg:core ./scripts/test_docker.sh

# Test full image
IMAGE_NAME=kreuzberg:full ./scripts/test_docker.sh

GitHub Actions

The .github/workflows/publish-docker.yaml workflow builds and publishes both variants to GitHub Container Registry:

  • ghcr.io/kreuzberg-dev/kreuzberg:VERSION-core - Core image (minimal runtime)
  • ghcr.io/kreuzberg-dev/kreuzberg:core - Latest core image
  • ghcr.io/kreuzberg-dev/kreuzberg:VERSION - Full image (all optional dependencies)
  • ghcr.io/kreuzberg-dev/kreuzberg:latest - Latest full image

For local development, use the local tags shown in the build commands above.

Recommendations

Choose Core if:

  • Minimal runtime setup
  • Standard document processing needs
  • Cloud deployments with cost constraints
  • Kubernetes or container orchestration

Choose Full if:

  • Want maximum optional dependencies preinstalled
  • Development and testing environments
  • "Batteries included" experience preferred