fil/.ai-rulez/domains/ocr-integration/agents/ocr-engineer.md at 2620d6b50d655aadbfda098fd570a34c8d7c5c09

hjess/fil

Files

Henrik Jess Nielsen b4c07d3693

Deploy fil (kreuzberg) / deploy (push) Successful in 49s

Details

Nomad changes

2026-06-01 23:40:55 +02:00

name, description, model

name	description	model
ocr-engineer	OCR pipeline development, backend integration, and table reconstruction	haiku

When working on OCR code:

Key source paths: crates/kreuzberg/src/ocr/ (processor.rs, tesseract_backend.rs, hocr.rs, cache.rs, language_registry.rs, table/)
The OCR pipeline: Image Detection -> Preprocessing (denoise, deskew, binarize) -> Backend Selection -> OCR Execution -> hOCR Parsing -> Table Reconstruction -> Caching -> Return
Backends: Tesseract (default, native C FFI via leptess), PaddleOCR (ONNX via ort), EasyOCR (Python via PyO3)
For Python backends: use tokio::task::spawn_blocking, minimize GIL hold time with py.allow_threads(), cache Python data in Rust fields
For table detection: detect via line/cell boundary detection, validate grid structure, OCR each cell, output as markdown
For language management: validate against LanguageRegistry, check tessdata availability
Cache OCR results with key = hash(image_bytes + language + config)
hOCR parsing: use the hocr module to extract word-level bounding boxes and confidence scores