# Kreuzberg Docker Images This directory contains Dockerfile variants for building Kreuzberg Docker images with different feature sets. ## Base Image Both variants use **Debian 13 (Trixie) slim** - the latest stable Debian release for optimal package availability and security updates. ## Image Variants ### 1. Core Image (`Dockerfile.core`) **Size:** ~1.0-1.3GB **Base:** debian:trixie-slim **Features:** PDF, DOCX, PPTX, images, HTML, XML, text, Excel, email, academic formats (LaTeX, EPUB, etc.) **OCR:** Tesseract (12 languages) **Legacy Office:** Native OLE/CFB parsing support **When to use:** - Production deployments where image size matters - Cloud environments with size/bandwidth constraints - Kubernetes deployments with frequent pod scaling - All use cases (both images have equivalent legacy Office support) **Build command:** ```bash docker build -f docker/Dockerfile.core -t kreuzberg:core . ``` ### 2. Full Image (`Dockerfile.full`) **Size:** ~1.0-1.3GB **Base:** debian:trixie-slim **Features:** All core features with native legacy Office format support **OCR:** Tesseract (12 languages) **Legacy Office:** Native OLE/CFB parsing for .doc, .ppt, .xls **When to use:** - Complete document intelligence pipeline with all optional dependencies - Development and testing environments - When you want maximum feature completeness **Build command:** ```bash docker build -f docker/Dockerfile.full -t kreuzberg:full . ``` ## Size Comparison | Component | Core | Full | Difference | | -------------------- | -------------- | -------------- | ----------------- | | Base (trixie-slim) | ~120MB | ~120MB | - | | Tesseract + 12 langs | ~250MB | ~250MB | - | | Rust binary | ~80MB | ~80MB | - | | System libraries | ~100MB | ~100MB | - | | **Total (approx)** | **~1.0-1.3GB** | **~1.0-1.3GB** | **- (same size)** | ## Default Image The root `Dockerfile` is a symlink to `Dockerfile.full` for backward compatibility and complete feature support by default. ## Multi-Architecture Support Both images support: - `linux/amd64` (x86_64) - `linux/arm64` (aarch64) Both architectures use the same pure-Rust PDF library — no architecture-specific binaries needed. ## Usage Modes All images support three execution modes via ENTRYPOINT: ### 1. API Server (default) ```bash docker run -p 8000:8000 kreuzberg:core # or override host/port: docker run -p 8000:8000 kreuzberg:core serve --host 0.0.0.0 --port 8000 ``` ### 2. CLI Mode ```bash docker run -v $(pwd):/data kreuzberg:core extract /data/document.pdf docker run -v $(pwd):/data kreuzberg:core detect /data/file.bin docker run -v $(pwd):/data kreuzberg:core batch /data/*.pdf ``` ### 3. MCP Server Mode ```bash docker run kreuzberg:core mcp ``` ## Testing Test scripts are provided to verify both image variants: ```bash # Test core image IMAGE_NAME=kreuzberg:core ./scripts/test_docker.sh # Test full image IMAGE_NAME=kreuzberg:full ./scripts/test_docker.sh ``` ## GitHub Actions The `.github/workflows/publish-docker.yaml` workflow builds and publishes both variants to GitHub Container Registry: - `ghcr.io/kreuzberg-dev/kreuzberg:VERSION-core` - Core image (minimal runtime) - `ghcr.io/kreuzberg-dev/kreuzberg:core` - Latest core image - `ghcr.io/kreuzberg-dev/kreuzberg:VERSION` - Full image (all optional dependencies) - `ghcr.io/kreuzberg-dev/kreuzberg:latest` - Latest full image For local development, use the local tags shown in the build commands above. ## Recommendations **Choose Core if:** - ✅ Minimal runtime setup - ✅ Standard document processing needs - ✅ Cloud deployments with cost constraints - ✅ Kubernetes or container orchestration **Choose Full if:** - ✅ Want maximum optional dependencies preinstalled - ✅ Development and testing environments - ✅ "Batteries included" experience preferred