# Open WebUI ![Kreuzberg](https://img.shields.io/badge/kreuzberg-v4.7.0+-blue) Open WebUI supports pluggable content extraction backends. Kreuzberg implements two of those backend APIs — the **docling-serve** endpoint and the **external document loader** endpoint, so it works as a drop-in replacement without patching Open WebUI. ## How it works 1. A user uploads a document (PDF, DOCX, image, etc.) in Open WebUI. 2. Open WebUI sends the file to Kreuzberg's API endpoint. 3. Kreuzberg extracts the content — running OCR where needed and returns Markdown. 4. Open WebUI stores the Markdown in its vector database for retrieval-augmented generation. Kreuzberg supports [90+ file formats](../reference/formats.md) and requires no GPU. ## Prerequisites - Docker and Docker Compose (v2) - Open WebUI running or ready to deploy - No GPU required — Kreuzberg runs entirely on CPU ## Setup with Docker Compose This is the fastest way to get both services running together. ```yaml title="docker-compose.yaml" services: kreuzberg: image: ghcr.io/kreuzberg-dev/kreuzberg:latest-core ports: - "8000:8000" command: ["serve", "--host", "0.0.0.0", "--port", "8000"] volumes: - kreuzberg-cache:/app/.kreuzberg healthcheck: test: ["CMD", "kreuzberg", "version"] interval: 10s timeout: 5s retries: 5 open-webui: image: ghcr.io/open-webui/open-webui:main ports: - "3000:8080" environment: CONTENT_EXTRACTION_ENGINE: "docling" DOCLING_SERVER_URL: "http://kreuzberg:8000" depends_on: kreuzberg: condition: service_healthy volumes: kreuzberg-cache: ``` Start both services in detached mode: ```bash docker compose up -d ``` Open `http://localhost:3000`, create an account, and upload a document. The extracted text will appear in the chat context. !!! Note "Cache volume" The `kreuzberg-cache` volume persists OCR models and embedding weights across restarts. Without it, models re-download on every container restart (~90 MB–1.2 GB depending on configuration). !!! Info "Already running Open WebUI?" Start Kreuzberg separately, then point Open WebUI to that Kreuzberg URL. === "Docker" ```bash docker run -d \ --name kreuzberg \ -p 8000:8000 \ -v kreuzberg-cache:/app/.kreuzberg \ ghcr.io/kreuzberg-dev/kreuzberg:latest-core \ serve --host 0.0.0.0 --port 8000 ``` === "CLI (Homebrew / Cargo)" ```bash kreuzberg serve --host 0.0.0.0 --port 8000 ``` Then configure Open WebUI using one of the two engine modes below. ## Choosing an engine mode Kreuzberg exposes two Open WebUI–compatible APIs. Both return the same extracted content. So pick whichever fits your setup. | | **Docling** (recommended) | **External** | | ------------------ | ------------------------- | ------------------------------ | | **Endpoint** | `POST /v1/convert/file` | `PUT /process` | | **Engine setting** | `docling` | `external` | | **URL variable** | `DOCLING_SERVER_URL` | `EXTERNAL_DOCUMENT_LOADER_URL` | === "Docling (recommended)" Set these environment variables on the Open WebUI container: ```yaml environment: CONTENT_EXTRACTION_ENGINE: "docling" DOCLING_SERVER_URL: "http://kreuzberg:8000" ``` Or via the Admin UI: **Settings → Documents → Content Extraction Engine** → select **Docling** → set server URL to `http://kreuzberg:8000`. === "External" Set these environment variables on the Open WebUI container: ```yaml environment: CONTENT_EXTRACTION_ENGINE: "external" EXTERNAL_DOCUMENT_LOADER_URL: "http://kreuzberg:8000" ``` Or via the Admin UI: **Settings → Documents → Content Extraction Engine** → select **External** → set URL to `http://kreuzberg:8000`. !!! Tip If Kreuzberg runs on a different host or port, replace `http://kreuzberg:8000` with the actual address. Inside Docker Compose, use the service name (`kreuzberg`). Outside Docker, use the host IP or `localhost`. ## Verify it works Test the endpoints directly before debugging through Open WebUI. === "Docling endpoint" ```bash curl -s -F "files=@invoice.pdf" http://localhost:8000/v1/convert/file | jq . ``` ```json title="Expected response" { "document": { "md_content": "# Invoice\n\nDate: 2026-01-15\n..." }, "status": "success" } ``` === "External endpoint" ```bash curl -s -X PUT \ -H "Content-Type: application/pdf" \ -H "X-Filename: invoice.pdf" \ --data-binary @invoice.pdf \ http://localhost:8000/process | jq . ``` ```json title="Expected response" { "page_content": "# Invoice\n\nDate: 2026-01-15\n...", "metadata": { "source": "invoice.pdf" } } ``` If the endpoint returns extracted text, the integration is working. Upload a document through Open WebUI to confirm end-to-end. ## Next steps - [Docker deployment guide](../guides/docker.md) — image variants, volumes, security hardening - [API server reference](../guides/api-server.md) — all endpoints and configuration options - [OCR guide](../guides/ocr.md) — language packs, engine selection, tuning - [Format support](../reference/formats.md) — full list of supported file types