Files
fil/.ai-rulez/domains/document-extraction/rules/extraction-quality.md

11 lines
434 B
Markdown
Raw Normal View History

2026-06-01 23:40:55 +02:00
---
priority: high
---
- 95% test coverage on core extraction code, 80% on bindings
- Test all format categories: text, office, PDF, images, archives, markup
- Test corrupted/malformed documents — extraction must fail gracefully, never panic
- Benchmark extraction speeds per format — track regressions in CI
- Test both success and error paths for every extractor
- Use property-based testing for parsers with wide input ranges