Files
fil/.ai-rulez/domains/document-extraction/rules/cache-and-performance.md
Henrik Jess Nielsen b4c07d3693
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s
Nomad changes
2026-06-01 23:40:55 +02:00

501 B

priority
priority
high
  • Cache keys: content-hash based (hash of file bytes + config), not path-based
  • Invalidate cache when extraction config changes (output format, OCR settings, etc.)
  • Check cache before any extraction — cache hits should skip all processing
  • Concurrent batch processing: use configurable worker pool, default to CPU count
  • Stream large files instead of loading into memory — use AsyncRead where possible
  • Monitor cache hit rates — target >80% for repeated extractions