# Layout Detection v4.5.0 Detect document layout regions (tables, figures, headers, text blocks, etc.) in PDFs using ONNX-based deep learning models. Enables table extraction, figure isolation, reading-order reconstruction, and selective OCR. !!! Note "Feature gate" Requires the `layout-detection` Cargo feature. Not included in the default feature set. ## Model Layout detection uses the **RT-DETR v2** model, an ONNX-based deep learning model that detects 17 layout element classes: text blocks, tables, figures, headers, footers, captions, code, lists, sections, formulas, footnotes, page headers/footers, titles, checkboxes, key-value regions, and document indices. ### When to Enable **Recommended for:** complex multi-column PDFs, scanned documents, academic papers, business forms, and any document where layout understanding improves extraction accuracy. **Less beneficial for:** simple single-column text documents, high-throughput pipelines where latency is critical (consider GPU acceleration), or documents already well-handled by PDF structure trees. ### Performance Impact | Pipeline | Structure F1 | Text F1 | Avg time/doc | | -------- | ------------ | ------- | ------------ | | Baseline | 33.9% | 87.4% | 447 ms | | Layout | 41.1% | 90.1% | 1500 ms | _171-document PDF corpus, CPU only. GPU acceleration significantly reduces the time penalty._ !!! Note "Layout Detection Model" Kreuzberg uses only the RT-DETR v2 model for layout detection. The `preset` field is not available in `LayoutDetectionConfig`. Configure table structure recognition separately via `table_model` — see "Table Structure Models" below. ## Configuration === "Python" ```python from kreuzberg import ExtractionConfig, LayoutDetectionConfig, extract_file config = ExtractionConfig( layout=LayoutDetectionConfig( confidence_threshold=0.5, apply_heuristics=True, table_model="tatr", ) ) result = await extract_file("document.pdf", config=config) ``` === "TypeScript" ```typescript const result = await extract("document.pdf", { layout: { confidenceThreshold: 0.5, applyHeuristics: true, tableModel: "tatr", }, }); ``` === "Rust" ```rust use kreuzberg::core::{ExtractionConfig, LayoutDetectionConfig}; let config = ExtractionConfig { layout: Some(LayoutDetectionConfig { confidence_threshold: Some(0.5), apply_heuristics: true, table_model: Some("tatr".to_string()), ..Default::default() }), ..Default::default() }; ``` === "TOML" ```toml title="kreuzberg.toml" [layout] apply_heuristics = true # table_model = "tatr" ``` === "CLI" ```bash title="Terminal" # Enable layout detection with default settings kreuzberg extract document.pdf --layout --content-format markdown # Custom confidence threshold kreuzberg extract document.pdf --layout-confidence 0.5 --content-format markdown # Specific table model kreuzberg extract document.pdf --layout --layout-table-model slanet_wired # Combined with GPU acceleration kreuzberg extract document.pdf --layout --acceleration coreml ``` See [LayoutDetectionConfig](../reference/configuration.md#layoutdetectionconfig) for all fields. ## Table Structure Models v4.5.3 When layout detection identifies a table region, a table structure model analyzes rows, columns, headers, and spanning cells. Set `LayoutDetectionConfig.table_model` to one of: | Value | Notes | | ----------------- | ----------------------------------------------------------- | | `tatr` | Default. Fast (~30 MB). General-purpose. | | `slanet_wired` | Higher accuracy for bordered/gridlined tables (~365 MB). | | `slanet_wireless` | Higher accuracy for borderless tables (~365 MB). | | `slanet_auto` | Auto-classifies per page (~737 MB). Slowest. | | `slanet_plus` | Smallest (~7.78 MB). For resource-constrained environments. | | `disabled` | Skip table structure recognition. | !!! Note "Model Download" SLANeXT models are not downloaded by default. Use `cache warm --all-table-models` to pre-download, or they download automatically on first use. ## GPU Acceleration Layout detection uses ONNX Runtime with automatic provider selection: | Provider | Platform | Notes | | -------- | -------------- | ----------------------------- | | CPU | All | Default, no setup needed | | CUDA | Linux, Windows | Requires CUDA toolkit + cuDNN | | CoreML | macOS | Automatic on Apple Silicon | | TensorRT | Linux | Requires TensorRT | To override: ```python config = ExtractionConfig( layout=LayoutDetectionConfig(), acceleration=AccelerationConfig(provider="cuda", device_id=0) ) ``` See [AccelerationConfig reference](../reference/configuration.md#accelerationconfig) for details. ## Layout Classes The RT-DETR v2 model detects 17 classes. Each `LayoutRegion.class_name` is one of: `caption`, `footnote`, `formula`, `list_item`, `page_footer`, `page_header`, `picture`, `section_header`, `table`, `text`, `title`, `document_index`, `code`, `checkbox_selected`, `checkbox_unselected`, `form`, `key_value_region`. See [`LayoutRegion`](../reference/types.md#layoutregion) in the types reference for the full field shape. ## Accessing Layout Regions When layout detection is enabled AND page extraction is enabled, each page in the result includes `layout_regions` — a list of detected regions with class, confidence score, bounding box, and area fraction. This enables programmatic filtering and analysis of specific layout elements. === "Python" ```python from kreuzberg import extract_file, ExtractionConfig, LayoutDetectionConfig, PagesConfig result = await extract_file( "document.pdf", config=ExtractionConfig( layout=LayoutDetectionConfig(), pages=PagesConfig(extract_pages=True), ), ) for page in result.pages: if page.layout_regions: for region in page.layout_regions: if region.class_name == "picture" and region.confidence > 0.9: print(f"Page {page.page_number}: diagram detected " f"(confidence={region.confidence:.2f}, " f"area={region.area_fraction:.0%})") ``` === "TypeScript" ```typescript const result = await extract("document.pdf", { layout: {}, pages: { extractPages: true }, }); for (const page of result.pages ?? []) { if (page.layoutRegions) { for (const region of page.layoutRegions) { if (region.className === "picture" && region.confidence > 0.9) { console.log( `Page ${page.pageNumber}: diagram detected ` + `(confidence=${region.confidence.toFixed(2)}, ` + `area=${(region.areaFraction * 100).toFixed(0)}%)` ); } } } } ``` === "Rust" ```rust use kreuzberg::core::{ExtractionConfig, LayoutDetectionConfig, PagesConfig}; let result = extract_file( "document.pdf", ExtractionConfig { layout: Some(LayoutDetectionConfig::default()), pages: Some(PagesConfig { extract_pages: true, ..Default::default() }), ..Default::default() }, ).await?; for page in &result.pages { if let Some(regions) = &page.layout_regions { for region in regions { if region.class_name == "picture" && region.confidence > 0.9 { println!( "Page {}: diagram detected (confidence={:.2}, area={:.0}%)", page.page_number, region.confidence, region.area_fraction * 100.0 ); } } } } ``` ### Tips - Use `confidence` to filter low-confidence detections — typically ≥ 0.8–0.9 for downstream operations - Use `area_fraction` to distinguish between inline images and full-page diagrams (e.g., `area_fraction > 0.1` for significant figures) - Regions are independent of page extraction — enable both to access both content and layout structure - Available across all bindings (Python, TypeScript, Rust, Ruby, Java, Go, Elixir, C#, PHP) ## Acknowledgments - **[Docling](https://github.com/DS4SD/docling)** — RT-DETR v2 model and layout classification approach - **[TATR](https://github.com/microsoft/table-transformer)** — Table structure recognition with ONNX - **[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)** — SLANeXT table structure and PP-LCNet classifier models ## Related - [Configuration Reference](../reference/configuration.md#layoutdetectionconfig) — full field reference - [Element-Based Output](output-formats.md#element-based-output-v410) — using layout-aware results