Files
fil/.ai-rulez/domains/ocr-integration/rules/ocr-table-and-hocr.md
Henrik Jess Nielsen b4c07d3693
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s
Nomad changes
2026-06-01 23:40:55 +02:00

458 B

priority
priority
high
  • hOCR parsing: extract word-level bounding boxes, confidence scores, and text content
  • Preserve spatial relationships from hOCR output for layout reconstruction
  • Table detection: use cell boundary detection (line detection + intersection analysis)
  • Validate grid structure before treating detected regions as tables
  • OCR each cell individually for better accuracy
  • Convert tables to markdown format with proper column alignment