5.3 KiB
Security Policy
Supported Versions
Security fixes are applied to the latest release on the main branch.
Patch releases are back-ported to the current minor series when the
vulnerability is rated High or Critical. Older minor series receive no
security back-ports.
| Version | Supported |
|---|---|
| 5.x | Yes |
| < 5.0 | No |
Threat Model
Kreuzberg is a document-extraction library. Its principal threat is hostile input documents — files crafted to exhaust memory, CPU, or disk, or to exfiltrate data from the calling process.
Protected attack surfaces
| Threat | Mitigation |
|---|---|
| Decompression bombs (ZIP/OOXML/PDF) | ZipBombValidator enforces SecurityLimits.max_compression_ratio (default 100×) and max_archive_size (default 500 MiB) across all archive and OOXML paths. PDF embedded-file streams are checked for ratio and absolute size before recursive processing. |
| Oversized embedded files | ExtractionConfig.max_embedded_file_bytes (default 50 MiB) caps any single embedded attachment before recursive extraction is attempted. Applies to OOXML (DOCX/PPTX), email attachments, and PDF embedded files. |
| Runaway recursive extraction | ExtractionConfig.max_archive_depth (default 3) limits archive nesting depth to prevent infinite recursion on mutually-embedded documents. |
| Extraction timeout | ExtractionConfig.extraction_timeout_secs (default 60 s) wraps the entire extraction future in tokio::time::timeout. Pathological documents that take longer are cancelled with KreuzbergError::Timeout. |
| Content-size bombs (repeated paragraphs) | SecurityBudget (StringGrowthValidator) enforces SecurityLimits.max_content_size (default 100 MiB) on accumulated element text for XML-class formats, email, and PDF. |
| XML / HTML entity expansion (billion laughs) | EntityValidator (per-token) and StringGrowthValidator (cumulative) are wired into every XML/HTML parser path. |
| Deeply nested XML / DOM depth bombs | DepthValidator enforces SecurityLimits.max_xml_depth and max_nesting_depth (both default 1024). |
| Table cell bombs (CSV / XLSX / HTML tables) | TableValidator enforces SecurityLimits.max_table_cells (default 100 000). |
| Path traversal in ZIP archives | has_path_traversal() in extractors::security uses std::path::Component::ParentDir rather than a string search, catching normalised traversal patterns. |
| DDE / external-call formula injection (Excel) | The Excel extractor scans all string cells against a regex matching =DDE(, =WEBSERVICE(, =HYPERLINK(, and `=cmd |
| OLE compound file execution | OLE binary streams inside OOXML archives (recognised by the D0 CF 11 E0 magic) are skipped with a ProcessingWarning because kreuzberg has no safe OLE execution path. |
Out of scope
- Network requests: kreuzberg never makes outbound network requests.
=WEBSERVICE(...)formulas and=HYPERLINK(...)cells generate warnings but the URLs are never resolved. - Macro execution: no VBA, JavaScript, or other macro runtime exists inside kreuzberg. Formula strings are read as data, not evaluated.
- Password-protected documents: encryption is not stripped; protected files are returned with an extraction error.
- Supply-chain / dependency vulnerabilities: report these directly to the dependency maintainer and open a GitHub advisory in this repo so we can update the pinned version.
Configuring limits
All limits are on ExtractionConfig.security_limits (SecurityLimits
struct) and ExtractionConfig.max_embedded_file_bytes. The defaults are
chosen to be permissive enough for legitimate real-world documents while
blocking the most common DoS payloads. Set limits to None or very large
values only for input you fully trust.
use kreuzberg::{ExtractionConfig, extractors::security::SecurityLimits};
let config = ExtractionConfig {
// Tighten limits for untrusted input from an upload endpoint.
security_limits: Some(SecurityLimits {
max_content_size: 10 * 1024 * 1024, // 10 MiB output cap
max_compression_ratio: 50, // 50× ratio cap
max_table_cells: 10_000,
..SecurityLimits::default()
}),
max_embedded_file_bytes: Some(5 * 1024 * 1024), // 5 MiB per embedded file
extraction_timeout_secs: Some(10), // 10 s timeout
..ExtractionConfig::default()
};
Reporting a Vulnerability
Do not open a public GitHub issue for security vulnerabilities.
Send a report to security@kreuzberg.dev with:
- A description of the vulnerability and affected versions.
- A minimal reproducer (if possible, a file that triggers the issue).
- Your assessment of severity (CVSS score or plain description).
- Whether you want public credit when the advisory is published.
We aim to acknowledge reports within 2 business days and to publish a fix within 14 calendar days for Critical/High issues and 30 days for Medium/Low. We will coordinate disclosure timing with you.
Researchers who follow responsible disclosure will be credited in the GitHub advisory unless they prefer to remain anonymous.