Kreuzberg Docker Images
This directory contains Dockerfile variants for building Kreuzberg Docker images with different feature sets.
Base Image
Both variants use Debian 13 (Trixie) slim - the latest stable Debian release for optimal package availability and security updates.
Image Variants
1. Core Image (Dockerfile.core)
Size: ~1.0-1.3GB Base: debian:trixie-slim Features: PDF, DOCX, PPTX, images, HTML, XML, text, Excel, email, academic formats (LaTeX, EPUB, etc.) OCR: Tesseract (12 languages) Legacy Office: Native OLE/CFB parsing support
When to use:
- Production deployments where image size matters
- Cloud environments with size/bandwidth constraints
- Kubernetes deployments with frequent pod scaling
- All use cases (both images have equivalent legacy Office support)
Build command:
docker build -f docker/Dockerfile.core -t kreuzberg:core .
2. Full Image (Dockerfile.full)
Size: ~1.0-1.3GB Base: debian:trixie-slim Features: All core features with native legacy Office format support OCR: Tesseract (12 languages) Legacy Office: Native OLE/CFB parsing for .doc, .ppt, .xls
When to use:
- Complete document intelligence pipeline with all optional dependencies
- Development and testing environments
- When you want maximum feature completeness
Build command:
docker build -f docker/Dockerfile.full -t kreuzberg:full .
Size Comparison
| Component | Core | Full | Difference |
|---|---|---|---|
| Base (trixie-slim) | ~120MB | ~120MB | - |
| Tesseract + 12 langs | ~250MB | ~250MB | - |
| Rust binary | ~80MB | ~80MB | - |
| System libraries | ~100MB | ~100MB | - |
| Total (approx) | ~1.0-1.3GB | ~1.0-1.3GB | - (same size) |
Default Image
The root Dockerfile is a symlink to Dockerfile.full for backward compatibility and complete feature support by default.
Multi-Architecture Support
Both images support:
linux/amd64(x86_64)linux/arm64(aarch64)
Both architectures use the same pure-Rust PDF library — no architecture-specific binaries needed.
Usage Modes
All images support three execution modes via ENTRYPOINT:
1. API Server (default)
docker run -p 8000:8000 kreuzberg:core
# or override host/port:
docker run -p 8000:8000 kreuzberg:core serve --host 0.0.0.0 --port 8000
2. CLI Mode
docker run -v $(pwd):/data kreuzberg:core extract /data/document.pdf
docker run -v $(pwd):/data kreuzberg:core detect /data/file.bin
docker run -v $(pwd):/data kreuzberg:core batch /data/*.pdf
3. MCP Server Mode
docker run kreuzberg:core mcp
Testing
Test scripts are provided to verify both image variants:
# Test core image
IMAGE_NAME=kreuzberg:core ./scripts/test_docker.sh
# Test full image
IMAGE_NAME=kreuzberg:full ./scripts/test_docker.sh
GitHub Actions
The .github/workflows/publish-docker.yaml workflow builds and publishes both variants to GitHub Container Registry:
ghcr.io/kreuzberg-dev/kreuzberg:VERSION-core- Core image (minimal runtime)ghcr.io/kreuzberg-dev/kreuzberg:core- Latest core imageghcr.io/kreuzberg-dev/kreuzberg:VERSION- Full image (all optional dependencies)ghcr.io/kreuzberg-dev/kreuzberg:latest- Latest full image
For local development, use the local tags shown in the build commands above.
Recommendations
Choose Core if:
- ✅ Minimal runtime setup
- ✅ Standard document processing needs
- ✅ Cloud deployments with cost constraints
- ✅ Kubernetes or container orchestration
Choose Full if:
- ✅ Want maximum optional dependencies preinstalled
- ✅ Development and testing environments
- ✅ "Batteries included" experience preferred