Nomad changes
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s

This commit is contained in:
Henrik Jess Nielsen
2026-06-01 23:40:55 +02:00
parent 72b1a0a6ed
commit b4c07d3693
5723 changed files with 1130655 additions and 0 deletions

137
docker/README.md Normal file
View File

@@ -0,0 +1,137 @@
# Kreuzberg Docker Images
This directory contains Dockerfile variants for building Kreuzberg Docker images with different feature sets.
## Base Image
Both variants use **Debian 13 (Trixie) slim** - the latest stable Debian release for optimal package availability and security updates.
## Image Variants
### 1. Core Image (`Dockerfile.core`)
**Size:** ~1.0-1.3GB
**Base:** debian:trixie-slim
**Features:** PDF, DOCX, PPTX, images, HTML, XML, text, Excel, email, academic formats (LaTeX, EPUB, etc.)
**OCR:** Tesseract (12 languages)
**Legacy Office:** Native OLE/CFB parsing support
**When to use:**
- Production deployments where image size matters
- Cloud environments with size/bandwidth constraints
- Kubernetes deployments with frequent pod scaling
- All use cases (both images have equivalent legacy Office support)
**Build command:**
```bash
docker build -f docker/Dockerfile.core -t kreuzberg:core .
```
### 2. Full Image (`Dockerfile.full`)
**Size:** ~1.0-1.3GB
**Base:** debian:trixie-slim
**Features:** All core features with native legacy Office format support
**OCR:** Tesseract (12 languages)
**Legacy Office:** Native OLE/CFB parsing for .doc, .ppt, .xls
**When to use:**
- Complete document intelligence pipeline with all optional dependencies
- Development and testing environments
- When you want maximum feature completeness
**Build command:**
```bash
docker build -f docker/Dockerfile.full -t kreuzberg:full .
```
## Size Comparison
| Component | Core | Full | Difference |
| -------------------- | -------------- | -------------- | ----------------- |
| Base (trixie-slim) | ~120MB | ~120MB | - |
| Tesseract + 12 langs | ~250MB | ~250MB | - |
| Rust binary | ~80MB | ~80MB | - |
| System libraries | ~100MB | ~100MB | - |
| **Total (approx)** | **~1.0-1.3GB** | **~1.0-1.3GB** | **- (same size)** |
## Default Image
The root `Dockerfile` is a symlink to `Dockerfile.full` for backward compatibility and complete feature support by default.
## Multi-Architecture Support
Both images support:
- `linux/amd64` (x86_64)
- `linux/arm64` (aarch64)
Both architectures use the same pure-Rust PDF library — no architecture-specific binaries needed.
## Usage Modes
All images support three execution modes via ENTRYPOINT:
### 1. API Server (default)
```bash
docker run -p 8000:8000 kreuzberg:core
# or override host/port:
docker run -p 8000:8000 kreuzberg:core serve --host 0.0.0.0 --port 8000
```
### 2. CLI Mode
```bash
docker run -v $(pwd):/data kreuzberg:core extract /data/document.pdf
docker run -v $(pwd):/data kreuzberg:core detect /data/file.bin
docker run -v $(pwd):/data kreuzberg:core batch /data/*.pdf
```
### 3. MCP Server Mode
```bash
docker run kreuzberg:core mcp
```
## Testing
Test scripts are provided to verify both image variants:
```bash
# Test core image
IMAGE_NAME=kreuzberg:core ./scripts/test_docker.sh
# Test full image
IMAGE_NAME=kreuzberg:full ./scripts/test_docker.sh
```
## GitHub Actions
The `.github/workflows/publish-docker.yaml` workflow builds and publishes both variants to GitHub Container Registry:
- `ghcr.io/kreuzberg-dev/kreuzberg:VERSION-core` - Core image (minimal runtime)
- `ghcr.io/kreuzberg-dev/kreuzberg:core` - Latest core image
- `ghcr.io/kreuzberg-dev/kreuzberg:VERSION` - Full image (all optional dependencies)
- `ghcr.io/kreuzberg-dev/kreuzberg:latest` - Latest full image
For local development, use the local tags shown in the build commands above.
## Recommendations
**Choose Core if:**
- ✅ Minimal runtime setup
- ✅ Standard document processing needs
- ✅ Cloud deployments with cost constraints
- ✅ Kubernetes or container orchestration
**Choose Full if:**
- ✅ Want maximum optional dependencies preinstalled
- ✅ Development and testing environments
- ✅ "Batteries included" experience preferred