5.1 KiB
MCP Integration v5.0.0
Kreuzberg speaks Model Context Protocol. That means any AI agent — Claude, Cursor, a custom LangChain pipeline — can extract documents, generate embeddings, and manage caches through a standard tool interface without writing extraction code.
Two commands to get started:
pip install "kreuzberg[all]"
kreuzberg mcp
That's it. You now have an MCP server running over stdio, ready for any compatible client.
How It Works
The MCP server wraps Kreuzberg's extraction engine behind standard tools, running as a child process over stdin/stdout with JSON-RPC messages — no HTTP ports or configuration needed.
flowchart LR
A["AI Agent\n(Claude, Cursor, etc.)"] -->|"JSON-RPC\nover stdio"| B["kreuzberg mcp"]
B --> C["Extraction Engine"]
B --> D["Embedding Engine"]
B --> E["Cache Layer"]
Server Modes
Stdio (Default)
The standard mode for local AI tools. The agent spawns kreuzberg mcp as a subprocess and communicates over pipes.
kreuzberg mcp
kreuzberg mcp --config kreuzberg.toml
This is what Claude Desktop, Cursor, and most MCP clients expect.
HTTP Transport
!!! Info "Feature flag: mcp-http" HTTP transport requires the mcp-http feature flag at build time.
For remote deployments or multi-client setups where stdio doesn't work — shared servers, team environments, cloud-hosted agents — HTTP transport exposes the same tool interface over the network.
Tools
Kreuzberg exposes 13 tools via MCP. All extraction tools accept an optional config object to override defaults:
Extraction: extract_file, extract_bytes, batch_extract_files, detect_mime_type, extract_structured
Embeddings: embed_text
Chunking: chunk_text
Cache: cache_stats, cache_clear, cache_manifest, cache_warm
Metadata: list_formats, get_version
extract_structured requires the server to be built with the liter-llm feature. Full parameter schemas are discoverable at runtime via the MCP client's list_tools call.
Connecting AI Tools
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"kreuzberg": {
"command": "kreuzberg",
"args": ["mcp"]
}
}
}
Restart Claude. Kreuzberg's tools appear automatically — ask Claude to "extract text from invoice.pdf" and it will call extract_file behind the scenes.
Cursor
Add to .cursor/mcp.json in your project root:
{
"mcpServers": {
"kreuzberg": {
"command": "kreuzberg",
"args": ["mcp"]
}
}
}
Python MCP Client
For building custom agent pipelines, use the official mcp Python SDK:
import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
async def main() -> None:
server_params = StdioServerParameters(
command="kreuzberg", args=["mcp"]
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
tools = await session.list_tools()
print(f"Available: {[t.name for t in tools.tools]}")
result = await session.call_tool(
"extract_file",
arguments={"path": "document.pdf"},
)
print(result)
asyncio.run(main())
Spawning from Python
If your application manages the server lifecycle directly:
import subprocess
process = subprocess.Popen(
["python", "-m", "kreuzberg", "mcp"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
print(f"MCP server running (PID {process.pid})")
Configuration
Pass a TOML config file to set extraction defaults for all tools:
kreuzberg mcp --config kreuzberg.toml
Individual tool calls override file defaults via a config parameter. See ExtractionConfig Reference for all available fields.
Running in Docker
docker run ghcr.io/kreuzberg-dev/kreuzberg:latest mcp
docker run \
-v $(pwd)/kreuzberg.toml:/config/kreuzberg.toml \
ghcr.io/kreuzberg-dev/kreuzberg:latest \
mcp --config /config/kreuzberg.toml
For production, use Compose with a persistent cache volume so embedding models don't re-download on restart:
services:
kreuzberg-mcp:
image: ghcr.io/kreuzberg-dev/kreuzberg:latest
command: mcp --config /config/kreuzberg.toml
volumes:
- ./kreuzberg.toml:/config/kreuzberg.toml:ro
- cache-data:/app/.kreuzberg
restart: unless-stopped
volumes:
cache-data:
What to Read Next
- API Server Guide — the HTTP REST API and detailed MCP tool reference
- Docker Deployment — container setup for all server modes
- Configuration Reference — every config option explained