Nomad changes
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s

This commit is contained in:
Henrik Jess Nielsen
2026-06-01 23:40:55 +02:00
parent 72b1a0a6ed
commit b4c07d3693
5723 changed files with 1130655 additions and 0 deletions

View File

@@ -0,0 +1,242 @@
# Layout Detection <span class="version-badge">v4.5.0</span>
Detect document layout regions (tables, figures, headers, text blocks, etc.) in PDFs using ONNX-based deep learning models. Enables table extraction, figure isolation, reading-order reconstruction, and selective OCR.
!!! Note "Feature gate" Requires the `layout-detection` Cargo feature. Not included in the default feature set.
## Model
Layout detection uses the **RT-DETR v2** model, an ONNX-based deep learning model that detects 17 layout element classes: text blocks, tables, figures, headers, footers, captions, code, lists, sections, formulas, footnotes, page headers/footers, titles, checkboxes, key-value regions, and document indices.
### When to Enable
**Recommended for:** complex multi-column PDFs, scanned documents, academic papers, business forms, and any document where layout understanding improves extraction accuracy.
**Less beneficial for:** simple single-column text documents, high-throughput pipelines where latency is critical (consider GPU acceleration), or documents already well-handled by PDF structure trees.
### Performance Impact
| Pipeline | Structure F1 | Text F1 | Avg time/doc |
| -------- | ------------ | ------- | ------------ |
| Baseline | 33.9% | 87.4% | 447 ms |
| Layout | 41.1% | 90.1% | 1500 ms |
_171-document PDF corpus, CPU only. GPU acceleration significantly reduces the time penalty._
!!! Note "Layout Detection Model" Kreuzberg uses only the RT-DETR v2 model for layout detection. The `preset` field is not available in `LayoutDetectionConfig`. Configure table structure recognition separately via `table_model` — see "Table Structure Models" below.
## Configuration
=== "Python"
```python
from kreuzberg import ExtractionConfig, LayoutDetectionConfig, extract_file
config = ExtractionConfig(
layout=LayoutDetectionConfig(
confidence_threshold=0.5,
apply_heuristics=True,
table_model="tatr",
)
)
result = await extract_file("document.pdf", config=config)
```
=== "TypeScript"
```typescript
const result = await extract("document.pdf", {
layout: {
confidenceThreshold: 0.5,
applyHeuristics: true,
tableModel: "tatr",
},
});
```
=== "Rust"
```rust
use kreuzberg::core::{ExtractionConfig, LayoutDetectionConfig};
let config = ExtractionConfig {
layout: Some(LayoutDetectionConfig {
confidence_threshold: Some(0.5),
apply_heuristics: true,
table_model: Some("tatr".to_string()),
..Default::default()
}),
..Default::default()
};
```
=== "TOML"
```toml title="kreuzberg.toml"
[layout]
apply_heuristics = true
# table_model = "tatr"
```
=== "CLI"
```bash title="Terminal"
# Enable layout detection with default settings
kreuzberg extract document.pdf --layout --content-format markdown
# Custom confidence threshold
kreuzberg extract document.pdf --layout-confidence 0.5 --content-format markdown
# Specific table model
kreuzberg extract document.pdf --layout --layout-table-model slanet_wired
# Combined with GPU acceleration
kreuzberg extract document.pdf --layout --acceleration coreml
```
See [LayoutDetectionConfig](../reference/configuration.md#layoutdetectionconfig) for all fields.
## Table Structure Models <span class="version-badge">v4.5.3</span>
When layout detection identifies a table region, a table structure model analyzes rows, columns, headers, and spanning cells. Set `LayoutDetectionConfig.table_model` to one of:
| Value | Notes |
| ----------------- | ----------------------------------------------------------- |
| `tatr` | Default. Fast (~30 MB). General-purpose. |
| `slanet_wired` | Higher accuracy for bordered/gridlined tables (~365 MB). |
| `slanet_wireless` | Higher accuracy for borderless tables (~365 MB). |
| `slanet_auto` | Auto-classifies per page (~737 MB). Slowest. |
| `slanet_plus` | Smallest (~7.78 MB). For resource-constrained environments. |
| `disabled` | Skip table structure recognition. |
!!! Note "Model Download" SLANeXT models are not downloaded by default. Use `cache warm --all-table-models` to pre-download, or they download automatically on first use.
## GPU Acceleration
Layout detection uses ONNX Runtime with automatic provider selection:
| Provider | Platform | Notes |
| -------- | -------------- | ----------------------------- |
| CPU | All | Default, no setup needed |
| CUDA | Linux, Windows | Requires CUDA toolkit + cuDNN |
| CoreML | macOS | Automatic on Apple Silicon |
| TensorRT | Linux | Requires TensorRT |
To override:
```python
config = ExtractionConfig(
layout=LayoutDetectionConfig(),
acceleration=AccelerationConfig(provider="cuda", device_id=0)
)
```
See [AccelerationConfig reference](../reference/configuration.md#accelerationconfig) for details.
## Layout Classes
The RT-DETR v2 model detects 17 classes. Each `LayoutRegion.class_name` is one of:
`caption`, `footnote`, `formula`, `list_item`, `page_footer`, `page_header`, `picture`, `section_header`, `table`, `text`, `title`, `document_index`, `code`, `checkbox_selected`, `checkbox_unselected`, `form`, `key_value_region`.
See [`LayoutRegion`](../reference/types.md#layoutregion) in the types reference for the full field shape.
## Accessing Layout Regions
When layout detection is enabled AND page extraction is enabled, each page in the result includes `layout_regions` — a list of detected regions with class, confidence score, bounding box, and area fraction. This enables programmatic filtering and analysis of specific layout elements.
=== "Python"
```python
from kreuzberg import extract_file, ExtractionConfig, LayoutDetectionConfig, PagesConfig
result = await extract_file(
"document.pdf",
config=ExtractionConfig(
layout=LayoutDetectionConfig(),
pages=PagesConfig(extract_pages=True),
),
)
for page in result.pages:
if page.layout_regions:
for region in page.layout_regions:
if region.class_name == "picture" and region.confidence > 0.9:
print(f"Page {page.page_number}: diagram detected "
f"(confidence={region.confidence:.2f}, "
f"area={region.area_fraction:.0%})")
```
=== "TypeScript"
```typescript
const result = await extract("document.pdf", {
layout: {},
pages: { extractPages: true },
});
for (const page of result.pages ?? []) {
if (page.layoutRegions) {
for (const region of page.layoutRegions) {
if (region.className === "picture" && region.confidence > 0.9) {
console.log(
`Page ${page.pageNumber}: diagram detected ` +
`(confidence=${region.confidence.toFixed(2)}, ` +
`area=${(region.areaFraction * 100).toFixed(0)}%)`
);
}
}
}
}
```
=== "Rust"
```rust
use kreuzberg::core::{ExtractionConfig, LayoutDetectionConfig, PagesConfig};
let result = extract_file(
"document.pdf",
ExtractionConfig {
layout: Some(LayoutDetectionConfig::default()),
pages: Some(PagesConfig {
extract_pages: true,
..Default::default()
}),
..Default::default()
},
).await?;
for page in &result.pages {
if let Some(regions) = &page.layout_regions {
for region in regions {
if region.class_name == "picture" && region.confidence > 0.9 {
println!(
"Page {}: diagram detected (confidence={:.2}, area={:.0}%)",
page.page_number,
region.confidence,
region.area_fraction * 100.0
);
}
}
}
}
```
### Tips
- Use `confidence` to filter low-confidence detections — typically ≥ 0.80.9 for downstream operations
- Use `area_fraction` to distinguish between inline images and full-page diagrams (e.g., `area_fraction > 0.1` for significant figures)
- Regions are independent of page extraction — enable both to access both content and layout structure
- Available across all bindings (Python, TypeScript, Rust, Ruby, Java, Go, Elixir, C#, PHP)
## Acknowledgments
- **[Docling](https://github.com/DS4SD/docling)** — RT-DETR v2 model and layout classification approach
- **[TATR](https://github.com/microsoft/table-transformer)** — Table structure recognition with ONNX
- **[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)** — SLANeXT table structure and PP-LCNet classifier models
## Related
- [Configuration Reference](../reference/configuration.md#layoutdetectionconfig) — full field reference
- [Element-Based Output](output-formats.md#element-based-output-v410) — using layout-aware results