# MCP Integration v5.0.0 Kreuzberg speaks [Model Context Protocol](https://modelcontextprotocol.io/). That means any AI agent — Claude, Cursor, a custom LangChain pipeline — can extract documents, generate embeddings, and manage caches through a standard tool interface without writing extraction code. Two commands to get started: ```bash title="Terminal" pip install "kreuzberg[all]" kreuzberg mcp ``` That's it. You now have an MCP server running over stdio, ready for any compatible client. --- ## How It Works The MCP server wraps Kreuzberg's extraction engine behind standard tools, running as a child process over stdin/stdout with JSON-RPC messages — no HTTP ports or configuration needed. ```mermaid flowchart LR A["AI Agent\n(Claude, Cursor, etc.)"] -->|"JSON-RPC\nover stdio"| B["kreuzberg mcp"] B --> C["Extraction Engine"] B --> D["Embedding Engine"] B --> E["Cache Layer"] ``` --- ## Server Modes ### Stdio (Default) The standard mode for local AI tools. The agent spawns `kreuzberg mcp` as a subprocess and communicates over pipes. ```bash title="Terminal" kreuzberg mcp kreuzberg mcp --config kreuzberg.toml ``` This is what Claude Desktop, Cursor, and most MCP clients expect. ### HTTP Transport !!! Info "Feature flag: `mcp-http`" HTTP transport requires the `mcp-http` feature flag at build time. For remote deployments or multi-client setups where stdio doesn't work — shared servers, team environments, cloud-hosted agents — HTTP transport exposes the same tool interface over the network. --- ## Tools Kreuzberg exposes 13 tools via MCP. All extraction tools accept an optional `config` object to override defaults: **Extraction:** `extract_file`, `extract_bytes`, `batch_extract_files`, `detect_mime_type`, `extract_structured` **Embeddings:** `embed_text` **Chunking:** `chunk_text` **Cache:** `cache_stats`, `cache_clear`, `cache_manifest`, `cache_warm` **Metadata:** `list_formats`, `get_version` `extract_structured` requires the server to be built with the `liter-llm` feature. Full parameter schemas are discoverable at runtime via the MCP client's `list_tools` call. --- ## Connecting AI Tools ### Claude Desktop Add to `~/Library/Application Support/Claude/claude_desktop_config.json`: ```json title="claude_desktop_config.json" { "mcpServers": { "kreuzberg": { "command": "kreuzberg", "args": ["mcp"] } } } ``` Restart Claude. Kreuzberg's tools appear automatically — ask Claude to "extract text from invoice.pdf" and it will call `extract_file` behind the scenes. ### Cursor Add to `.cursor/mcp.json` in your project root: ```json title=".cursor/mcp.json" { "mcpServers": { "kreuzberg": { "command": "kreuzberg", "args": ["mcp"] } } } ``` ### Python MCP Client For building custom agent pipelines, use the official `mcp` Python SDK: ```python title="mcp_client.py" import asyncio from mcp import ClientSession, StdioServerParameters from mcp.client.stdio import stdio_client async def main() -> None: server_params = StdioServerParameters( command="kreuzberg", args=["mcp"] ) async with stdio_client(server_params) as (read, write): async with ClientSession(read, write) as session: await session.initialize() tools = await session.list_tools() print(f"Available: {[t.name for t in tools.tools]}") result = await session.call_tool( "extract_file", arguments={"path": "document.pdf"}, ) print(result) asyncio.run(main()) ``` ### Spawning from Python If your application manages the server lifecycle directly: ```python title="spawn_server.py" import subprocess process = subprocess.Popen( ["python", "-m", "kreuzberg", "mcp"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, ) print(f"MCP server running (PID {process.pid})") ``` --- ## Configuration Pass a TOML config file to set extraction defaults for all tools: ```bash title="Terminal" kreuzberg mcp --config kreuzberg.toml ``` Individual tool calls override file defaults via a `config` parameter. See [ExtractionConfig Reference](../reference/configuration.md) for all available fields. --- ## Running in Docker ```bash title="Terminal" docker run ghcr.io/kreuzberg-dev/kreuzberg:latest mcp docker run \ -v $(pwd)/kreuzberg.toml:/config/kreuzberg.toml \ ghcr.io/kreuzberg-dev/kreuzberg:latest \ mcp --config /config/kreuzberg.toml ``` For production, use Compose with a persistent cache volume so embedding models don't re-download on restart: ```yaml title="docker-compose.yaml" services: kreuzberg-mcp: image: ghcr.io/kreuzberg-dev/kreuzberg:latest command: mcp --config /config/kreuzberg.toml volumes: - ./kreuzberg.toml:/config/kreuzberg.toml:ro - cache-data:/app/.kreuzberg restart: unless-stopped volumes: cache-data: ``` --- ## What to Read Next - [API Server Guide](api-server.md) — the HTTP REST API and detailed MCP tool reference - [Docker Deployment](docker.md) — container setup for all server modes - [Configuration Reference](../reference/configuration.md) — every config option explained