.ai-rulez/domains/document-extraction/rules/cache-and-performance.md

---
priority: high
---

- Cache keys: content-hash based (hash of file bytes + config), not path-based
- Invalidate cache when extraction config changes (output format, OCR settings, etc.)
- Check cache before any extraction — cache hits should skip all processing
- Concurrent batch processing: use configurable worker pool, default to CPU count
- Stream large files instead of loading into memory — use AsyncRead where possible
- Monitor cache hit rates — target >80% for repeated extractions
Nomad changes 2026-06-01 23:40:55 +02:00			`---`
			`priority: high`
			`---`

			`- Cache keys: content-hash based (hash of file bytes + config), not path-based`
			`- Invalidate cache when extraction config changes (output format, OCR settings, etc.)`
			`- Check cache before any extraction — cache hits should skip all processing`
			`- Concurrent batch processing: use configurable worker pool, default to CPU count`
			`- Stream large files instead of loading into memory — use AsyncRead where possible`
			`- Monitor cache hit rates — target >80% for repeated extractions`