Files
fil/docs/integrations/openwebui.md
Henrik Jess Nielsen b4c07d3693
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s
Nomad changes
2026-06-01 23:40:55 +02:00

5.3 KiB
Raw Blame History

Open WebUI

Kreuzberg

Open WebUI supports pluggable content extraction backends. Kreuzberg implements two of those backend APIs — the docling-serve endpoint and the external document loader endpoint, so it works as a drop-in replacement without patching Open WebUI.

How it works

  1. A user uploads a document (PDF, DOCX, image, etc.) in Open WebUI.
  2. Open WebUI sends the file to Kreuzberg's API endpoint.
  3. Kreuzberg extracts the content — running OCR where needed and returns Markdown.
  4. Open WebUI stores the Markdown in its vector database for retrieval-augmented generation.

Kreuzberg supports 90+ file formats and requires no GPU.

Prerequisites

  • Docker and Docker Compose (v2)
  • Open WebUI running or ready to deploy
  • No GPU required — Kreuzberg runs entirely on CPU

Setup with Docker Compose

This is the fastest way to get both services running together.

services:
  kreuzberg:
    image: ghcr.io/kreuzberg-dev/kreuzberg:latest-core
    ports:
      - "8000:8000"
    command: ["serve", "--host", "0.0.0.0", "--port", "8000"]
    volumes:
      - kreuzberg-cache:/app/.kreuzberg
    healthcheck:
      test: ["CMD", "kreuzberg", "version"]
      interval: 10s
      timeout: 5s
      retries: 5

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    environment:
      CONTENT_EXTRACTION_ENGINE: "docling"
      DOCLING_SERVER_URL: "http://kreuzberg:8000"
    depends_on:
      kreuzberg:
        condition: service_healthy

volumes:
  kreuzberg-cache:

Start both services in detached mode:

docker compose up -d

Open http://localhost:3000, create an account, and upload a document. The extracted text will appear in the chat context.

!!! Note "Cache volume" The kreuzberg-cache volume persists OCR models and embedding weights across restarts. Without it, models re-download on every container restart (~90 MB1.2 GB depending on configuration).

!!! Info "Already running Open WebUI?" Start Kreuzberg separately, then point Open WebUI to that Kreuzberg URL.

=== "Docker"

```bash
docker run -d \
  --name kreuzberg \
  -p 8000:8000 \
  -v kreuzberg-cache:/app/.kreuzberg \
  ghcr.io/kreuzberg-dev/kreuzberg:latest-core \
  serve --host 0.0.0.0 --port 8000
```

=== "CLI (Homebrew / Cargo)"

```bash
kreuzberg serve --host 0.0.0.0 --port 8000
```

Then configure Open WebUI using one of the two engine modes below.

Choosing an engine mode

Kreuzberg exposes two Open WebUIcompatible APIs. Both return the same extracted content. So pick whichever fits your setup.

Docling (recommended) External
Endpoint POST /v1/convert/file PUT /process
Engine setting docling external
URL variable DOCLING_SERVER_URL EXTERNAL_DOCUMENT_LOADER_URL

=== "Docling (recommended)"

Set these environment variables on the Open WebUI container:

```yaml
environment:
  CONTENT_EXTRACTION_ENGINE: "docling"
  DOCLING_SERVER_URL: "http://kreuzberg:8000"
```

Or via the Admin UI: **Settings → Documents → Content Extraction Engine** → select **Docling** → set server URL to `http://kreuzberg:8000`.

=== "External"

Set these environment variables on the Open WebUI container:

```yaml
environment:
  CONTENT_EXTRACTION_ENGINE: "external"
  EXTERNAL_DOCUMENT_LOADER_URL: "http://kreuzberg:8000"
```

Or via the Admin UI: **Settings → Documents → Content Extraction Engine** → select **External** → set URL to `http://kreuzberg:8000`.

!!! Tip If Kreuzberg runs on a different host or port, replace http://kreuzberg:8000 with the actual address. Inside Docker Compose, use the service name (kreuzberg). Outside Docker, use the host IP or localhost.

Verify it works

Test the endpoints directly before debugging through Open WebUI.

=== "Docling endpoint"

```bash
curl -s -F "files=@invoice.pdf" http://localhost:8000/v1/convert/file | jq .
```

```json title="Expected response"
{
  "document": {
    "md_content": "# Invoice\n\nDate: 2026-01-15\n..."
  },
  "status": "success"
}
```

=== "External endpoint"

```bash
curl -s -X PUT \
  -H "Content-Type: application/pdf" \
  -H "X-Filename: invoice.pdf" \
  --data-binary @invoice.pdf \
  http://localhost:8000/process | jq .
```

```json title="Expected response"
{
  "page_content": "# Invoice\n\nDate: 2026-01-15\n...",
  "metadata": {
    "source": "invoice.pdf"
  }
}
```

If the endpoint returns extracted text, the integration is working. Upload a document through Open WebUI to confirm end-to-end.

Next steps