fil/docs/guides/llm-integration.md

# LLM Integration <span class="version-badge">v4.8.0</span>

Kreuzberg integrates with 143 LLM providers (including local inference engines) via [liter-llm](https://github.com/kreuzberg-dev/liter-llm) for three capabilities: VLM OCR, structured extraction, and provider-hosted embeddings.

!!! Note "Feature gate" Requires the `liter-llm` Cargo feature. Not included in the default feature set.

## VLM OCR

Use vision-language models as an OCR backend by rendering document pages as images and sending them to the VLM for text extraction.

### When to Use

- Low-quality scanned documents where traditional OCR struggles
- Handwritten text recognition
- Arabic, Farsi, and other scripts with poor Tesseract/PaddleOCR support
- Complex layouts where traditional OCR fails (mixed tables, forms, diagrams)
- When you need higher accuracy and can accept higher latency and API costs

### Configuration

=== "Python"

    --8<-- "snippets/python/llm/vlm_ocr.md"

=== "TypeScript"

    --8<-- "snippets/typescript/llm/vlm_ocr.md"

=== "Rust"

    ```rust title="Rust"
    use kreuzberg::{extract_file, ExtractionConfig, OcrConfig, LlmConfig};

    let config = ExtractionConfig {
        force_ocr: true,
        ocr: Some(OcrConfig {
            backend: "vlm".to_string(),
            vlm_config: Some(LlmConfig {
                model: "openai/gpt-4o-mini".to_string(),
                ..Default::default()
            }),
            ..Default::default()
        }),
        ..Default::default()
    };
    let result = extract_file("scan.pdf", None, &config).await?;
    ```

=== "CLI"

    ```bash title="Terminal"
    kreuzberg extract scan.pdf --force-ocr true \
      --vlm-model openai/gpt-4o-mini
    ```

=== "TOML"

    ```toml title="kreuzberg.toml"
    force_ocr = true

    [ocr]
    backend = "vlm"

    [ocr.vlm_config]
    model = "openai/gpt-4o-mini"
    ```

=== "Environment Variables"

    ```bash title="Terminal"
    export KREUZBERG_VLM_OCR_MODEL=openai/gpt-4o-mini
    export OPENAI_API_KEY=sk-...
    ```

### Custom VLM Prompt

Override the default prompt template for VLM OCR:

```python title="Python"
from kreuzberg import ExtractionConfig, OcrConfig, LlmConfig

config = ExtractionConfig(
    force_ocr=True,
    ocr=OcrConfig(
        backend="vlm",
        vlm_config=LlmConfig(model="openai/gpt-4o-mini"),
        vlm_prompt="Extract all text from this document image. Preserve formatting.",
    ),
)
```

### Supported Providers

Any liter-llm vision-capable provider works as a VLM OCR backend:

| Provider          | Example Model                          |
| ----------------- | -------------------------------------- |
| OpenAI            | `openai/gpt-4o`, `openai/gpt-4o-mini`  |
| Anthropic         | `anthropic/claude-3-5-sonnet-20241022` |
| Google            | `google/gemini-2.0-flash`              |
| Groq              | `groq/llama-3.2-90b-vision-preview`    |
| Ollama (local)    | `ollama/llama3.2-vision`               |
| LM Studio (local) | `lmstudio/llava-1.5`                   |
| vLLM (local)      | `vllm/llava-next`                      |

## Structured Extraction

Extract structured JSON data from documents by providing a schema; the document text is sent to an LLM for conforming extraction.

### Basic Usage

=== "Python"

    --8<-- "snippets/python/llm/structured_extraction.md"

=== "TypeScript"

    --8<-- "snippets/typescript/llm/structured_extraction.md"

=== "Rust"

    --8<-- "snippets/rust/llm/structured_extraction.md"

=== "CLI"

    ```bash title="Terminal"
    kreuzberg extract-structured paper.pdf \
      --schema schema.json \
      --model openai/gpt-4o-mini \
      --strict
    ```

=== "TOML"

    ```toml title="kreuzberg.toml"
    [structured_extraction]
    schema_name = "paper_metadata"
    strict = true

    [structured_extraction.schema]
    type = "object"

    [structured_extraction.schema.properties.title]
    type = "string"

    [structured_extraction.schema.properties.date]
    type = "string"

    [structured_extraction.llm]
    model = "openai/gpt-4o-mini"
    ```

### Custom Prompts (Jinja2)

Override the default extraction prompt with a Jinja2 template:

```python title="Python"
from kreuzberg import ExtractionConfig, StructuredExtractionConfig, LlmConfig

config = ExtractionConfig(
    structured_extraction=StructuredExtractionConfig(
        schema={"type": "object", "properties": {"title": {"type": "string"}}},
        llm=LlmConfig(model="openai/gpt-4o-mini"),
        prompt=(
            "Analyze this document and extract key metadata.\n\n"
            "Document:\n{{ content }}\n\n"
            "Schema: {{ schema }}"
        ),
    ),
)
```

Available template variables:

| Variable                   | Description                               |
| -------------------------- | ----------------------------------------- |
| `{{ content }}`            | The extracted document text               |
| `{{ schema }}`             | The JSON schema as a formatted string     |
| `{{ schema_name }}`        | The schema name (default: `"extraction"`) |
| `{{ schema_description }}` | The schema description (may be empty)     |

### Cross-Provider Compatibility

Structured extraction handles provider differences automatically:

- **OpenAI**: Full strict mode with `additionalProperties` enforcement
- **Anthropic/Gemini**: `additionalProperties` automatically stripped (not supported by these providers)
- **All providers**: Markdown code fence wrapping in responses is automatically handled

### Strict Mode

When `strict=True`, the LLM is instructed to produce output that exactly matches the schema. This enables OpenAI's structured output mode and adds validation on the response.

## VLM Embeddings

Use provider-hosted embedding models when you need to match your vector database model or local ONNX models are unavailable.

### Configuration

=== "Python"

    --8<-- "snippets/python/llm/vlm_embeddings.md"

=== "TypeScript"

    ```typescript title="TypeScript"
    import { embedSync } from '@kreuzberg/node';

    const embeddings = embedSync(['Hello world'], {
      model: {
        modelType: 'llm',
        value: 'openai/text-embedding-3-small',
      },
      normalize: true,
    });
    console.log(embeddings[0].length); // 1536
    ```

=== "Rust"

    ```rust title="Rust"
    use kreuzberg::{embed_texts, EmbeddingConfig, EmbeddingModelType, LlmConfig};

    let config = EmbeddingConfig {
        model: EmbeddingModelType::Llm {
            llm: LlmConfig {
                model: "openai/text-embedding-3-small".to_string(),
                ..Default::default()
            },
        },
        normalize: true,
        ..Default::default()
    };
    let embeddings = embed_texts(&["Hello world"], &config)?;
    ```

=== "CLI"

    ```bash title="Terminal"
    kreuzberg embed \
      --provider llm \
      --model openai/text-embedding-3-small \
      --text "Hello world"
    ```

### Available Models

| Model                                    | Dimensions | Provider |
| ---------------------------------------- | ---------- | -------- |
| `openai/text-embedding-3-small`          | 1536       | OpenAI   |
| `openai/text-embedding-3-large`          | 3072       | OpenAI   |
| `mistral/mistral-embed`                  | 1024       | Mistral  |
| Any liter-llm embedding-capable provider | Varies     | Various  |

## Local LLM Support

<span class="version-badge">v4.8.0</span>

Run local LLM inference engines via [liter-llm](https://github.com/kreuzberg-dev/liter-llm)'s provider routing; point to your local server without needing an API key.

### Supported Local Engines

| Engine                                                 | Prefix       | Default URL                 | Install               |
| ------------------------------------------------------ | ------------ | --------------------------- | --------------------- |
| [Ollama](https://ollama.com)                           | `ollama/`    | `http://localhost:11434/v1` | `brew install ollama` |
| [LM Studio](https://lmstudio.ai)                       | `lmstudio/`  | `http://localhost:1234/v1`  | Desktop app           |
| [vLLM](https://vllm.ai)                                | `vllm/`      | `http://localhost:8000/v1`  | `pip install vllm`    |
| [llama.cpp](https://github.com/ggerganov/llama.cpp)    | `llamacpp/`  | `http://localhost:8080/v1`  | Build from source     |
| [LocalAI](https://localai.io)                          | `localai/`   | `http://localhost:8080/v1`  | Docker                |
| [llamafile](https://github.com/Mozilla-Ocho/llamafile) | `llamafile/` | `http://localhost:8080/v1`  | Single binary         |

### Example: Ollama

=== "CLI" ```Bash

    # Start Ollama and pull a model

    ollama pull llama3.2-vision

    # Use it for VLM OCR (no API key needed)
    kreuzberg extract scan.pdf --force-ocr true \
      --vlm-model ollama/llama3.2-vision

    # Use it for structured extraction
    kreuzberg extract-structured doc.pdf \
      --schema schema.json \
      --model ollama/llama3.2

    # Use it for embeddings
    kreuzberg embed --provider llm \
      --model ollama/all-minilm \
      --text "Hello world"
    ```

=== "Python" ```python from Kreuzberg import extract_file, ExtractionConfig, StructuredExtractionConfig, LlmConfig

    config = ExtractionConfig(
        structured_extraction=StructuredExtractionConfig(
            schema={"type": "object", "properties": {"title": {"type": "string"}}},
            llm=LlmConfig(model="ollama/llama3.2"),  # No api_key needed
        ),
    )
    result = await extract_file("doc.pdf", config=config)
    ```

=== "TOML Config" ```toml [structured_extraction.llm] model = "ollama/llama3.2"

    # No api_key needed for local providers
    ```

!!! Tip "Custom Base URL" If your local server runs on a non-default port, use `base_url`:
`python
    LlmConfig(model="ollama/llama3.2", base_url="http://localhost:11435/v1")`

## LLM Usage Tracking

Every LLM call made during extraction is tracked in the `llm_usage` field of `ExtractionResult`. Each entry records the model used, token counts, estimated cost, and why the model stopped generating.

=== "Python"

    ```python
    result = await extract_file("document.pdf", config)
    if result.get("llm_usage"):
        for usage in result["llm_usage"]:
            print(f"{usage['source']}: {usage['input_tokens']} in, {usage['output_tokens']} out, ${usage['estimated_cost']:.4f}")
    ```

=== "TypeScript"

    ```typescript
    const result = await extractFile("document.pdf", config);
    for (const usage of result.llmUsage ?? []) {
        console.log(`${usage.source}: ${usage.inputTokens} in, ${usage.outputTokens} out, $${usage.estimatedCost?.toFixed(4)}`);
    }
    ```

=== "Rust"

    ```rust
    let result = extract_file("document.pdf", &config).await?;
    if let Some(usages) = &result.llm_usage {
        for usage in usages {
            println!("{}: {} in, {} out", usage.source, usage.input_tokens.unwrap_or(0), usage.output_tokens.unwrap_or(0));
        }
    }
    ```

The `source` field indicates which pipeline stage triggered the call: `"vlm_ocr"`, `"structured_extraction"`, or `"embeddings"`.

## API Key Configuration

API keys can be set via (in order of precedence):

1. `api_key` field in `LlmConfig` — highest priority, per-request
2. Provider standard env vars (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY`, etc.)
3. Kreuzberg-specific env var (`KREUZBERG_LLM_API_KEY`) — used as fallback for any provider

!!! Note "Local providers skip API key lookup" Local inference engines (Ollama, LM Studio, vLLM, llama.cpp, LocalAI, llamafile) do not require an API key. If you use a local provider prefix (for example, `ollama/`), the API key fields are ignored.

```python title="Python"
from kreuzberg import LlmConfig

# Explicit API key
config = LlmConfig(model="openai/gpt-4o", api_key="sk-...")

# Custom base URL (e.g., Azure OpenAI, local proxy)
config = LlmConfig(
    model="openai/gpt-4o",
    base_url="https://my-proxy.example.com/v1",
)
```

## LlmConfig Reference

| Field          | Type            | Default    | Description                                                         |
| -------------- | --------------- | ---------- | ------------------------------------------------------------------- |
| `model`        | `str`           | _required_ | Provider/model in liter-llm format (for example, `"openai/gpt-4o"`) |
| `api_key`      | `str \| None`   | `None`     | API key (falls back to env vars)                                    |
| `base_url`     | `str \| None`   | `None`     | Custom endpoint URL                                                 |
| `timeout_secs` | `int \| None`   | `60`       | Request timeout in seconds                                          |
| `max_retries`  | `int \| None`   | `3`        | Maximum retry attempts                                              |
| `temperature`  | `float \| None` | `None`     | Sampling temperature                                                |
| `max_tokens`   | `int \| None`   | `None`     | Maximum tokens to generate                                          |

## REST API

### Structured Extraction

`POST /extract-structured` — multipart form with file + schema + model configuration.

```bash title="Terminal"
curl -X POST http://localhost:4000/extract-structured \
  -F "file=@invoice.pdf" \
  -F 'schema={"type":"object","properties":{"vendor":{"type":"string"},"total":{"type":"number"}}}' \
  -F "model=openai/gpt-4o-mini" \
  -F "strict=true"
```

## MCP Tools

When running Kreuzberg as an MCP server, LLM features are available as tools:

- `extract_structured` — extract structured data from a document using a JSON schema
- `embed_text` — extended with `model` parameter for LLM-hosted embeddings

## Related

- [OCR](ocr.md) — OCR backends including VLM OCR
- [Configuration Reference](configuration.md) — full field reference for all config types
- [Advanced Features](advanced.md) — chunking, language detection, local embeddings
- [API Server](api-server.md) — REST API endpoints