fil/.ai-rulez/domains/document-extraction/rules/cache-and-performance.md at b4c07d36934823e7b674ed498e966d1583a7b4bc

hjess/fil

Files

Henrik Jess Nielsen b4c07d3693

Deploy fil (kreuzberg) / deploy (push) Successful in 49s

Details

Nomad changes

2026-06-01 23:40:55 +02:00

priority

priority
high

Cache keys: content-hash based (hash of file bytes + config), not path-based
Invalidate cache when extraction config changes (output format, OCR settings, etc.)
Check cache before any extraction — cache hits should skip all processing
Concurrent batch processing: use configurable worker pool, default to CPU count
Stream large files instead of loading into memory — use AsyncRead where possible
Monitor cache hit rates — target >80% for repeated extractions