Files
fil/docs/perf/iterations.md
Henrik Jess Nielsen b4c07d3693
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s
Nomad changes
2026-06-01 23:40:55 +02:00

77 lines
6.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Performance Iteration Log
Per-iteration tracker for Kreuzberg perf optimization rounds. Append one row per accepted or rejected candidate. Follows the protocol in `profiling.md`. The institutional memory replacement for the `feedback_perf_subagent_verification.md` failure-mode warning.
## Format
| commit | candidate | hotspot self-time | p50 Δ | p95 Δ | SF1 Δ | verdict | notes |
| ------ | --------- | ----------------- | ----- | ----- | ----- | ------- | ----- |
- **commit** — short SHA of the optimization commit (or REVERTED if rejected).
- **candidate** — file:function being optimized.
- **hotspot self-time** — pre-fix percentage from the flamegraph.
- **p50 Δ / p95 Δ** — change in median/tail extraction time vs prior baseline JSON.
- **SF1 Δ** — change in aggregate SF1 score. Must be ≥ 0.1pt or revert.
- **verdict** — ACCEPT, MARGINAL (lift < 3%, no regression), or REJECT.
- **notes** — one sentence on why.
## History
| commit | candidate | hotspot self-time | p50 Δ | p95 Δ | SF1 Δ | verdict | notes |
| --------- | ---------------------------------------------------------------------------------------------------------- | --------------------------------------------- | ----- | ----- | ----- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| REVERTED | normalize_whitespace rewrite | not measured | n/a | n/a | n/a | REJECT | perf-engineer agent ACCEPT'd without measurement; correctness regression on leading/trailing spaces. See `feedback_perf_subagent_verification.md`. |
| REVERTED | split_embedded bullet-count fast path | not measured | +5% | +5% | 0 | REJECT | speculation-driven; ~5% wall-time _regression_. Don't optimize without a flamegraph. |
| b51472c1c | layout_runner: stream DynamicImage→RgbImage + drop redundant clones in table_recognition/layout_validation | n/a (memory, not CPU) | n/a | n/a | 0 | ACCEPT | No pre-M baseline; post-M anchor: 292 MB peak RSS on 60 MB PDF (plain, no layout). Q gates: 143/143 regression, 3/3 smoke, 18/18 guardrail failures identical to pre-M (all pre-existing pdf_oxide upstream). |
| 86a706959 | rendering::markdown: Cow single-pass scans replacing 6 eager .replace() chains | 0.02% self-time post-M (flamegraph fa356cb7e) | n/a | n/a | 0 | ACCEPT | M.2 confirmed effective — render_markdown dropped to 0.02% in post-M flamegraph; was queue candidate #2. |
## Queue — CLEARED (stopping condition met)
**Flamegraph `flamegraphs/fa356cb7e/baseline.svg`** (2026-05-11, 88,524 samples, `--profile profiling`, `--features all`):
| rank | self-time | function |
| ---- | --------- | ------------------------------------------------------------------------------- |
| 1 | 0.50% | `kreuzberg::pdf::oxide::table::extract_tables_native` |
| 2 | 0.33% | `kreuzberg::pdf::oxide::text::extract_text_fast_path` |
| 3 | 0.14% | `kreuzberg::pdf::oxide::hierarchy::extract_all_segments` |
| 4 | 0.12% | `kreuzberg::cache::core::GenericCache::set` |
| 5 | 0.11% | `kreuzberg::pdf::oxide::images::extract_image_positions` |
| 6 | 0.11% | `kreuzberg::pdf::structure::pipeline::extract_document_structure_from_segments` |
| 7 | 0.09% | `kreuzberg::cache::cleanup::scan_cache_directory` |
| 8 | 0.08% | `kreuzberg::pdf::structure::classify::mark_arxiv_noise` |
| 9 | 0.02% | `kreuzberg::rendering::markdown::render_markdown` |
**Breakdown by crate (aggregate, 88,524 total samples):**
- System/other: 48.4%
- Std/core/alloc: 25.3%
- Benchmark_harness (quality scorer): 9.9%
- Pdf_oxide: 9.3%
- Rayon: 3.7%
- **Kreuzberg: 3.4%**
- Tokio: 0.1%
**Stopping condition:** Kreuzberg layer accounts for only 3.4% of total wall time. The previous queue candidates (fuse_paragraphs, text_repair, normalize_key, classify::merge_consecutive_pages) do not appear in the top-25 Kreuzberg frames — confirmed not hot on the baseline pipeline path. Dominant cost is pdf_oxide text/table extraction (9.3%) + system allocator + OS overhead (48.4%) — these are outside kreuzberg's optimization surface.
**Further kreuzberg-layer CPU gains require upstream pdf_oxide work** (table extraction at 0.50% is the single largest kreuzberg-visible hotspot; it delegates to pdf_oxide). Cache I/O (scan_cache_directory) at 0.09% is the next actionable target if cache efficiency becomes a priority, but it's below the noise floor for extraction pipelines.
### Previous blockers resolved
- Symbol-strip blocker from `flamegraphs/61170f7f6/baseline.svg` is fixed: `.task/workflows/benchmark.yml` patched from `--features full``--features all` (kreuzberg-cli has no `full` feature). The `fa356cb7e` flamegraph has 87 Kreuzberg symbols resolved.
### Post-M RSS anchor
Measured 2026-05-11 on `target/profiling/kreuzberg` (post-M, `--features all`, `--profile profiling`):
- **Fixture**: `test_documents/pdf/proof_of_concept_or_gtfo_v13_october_18th_2016.pdf` (60 MB, plain extraction, no layout detection)
- **Peak RSS**: 292 MB (`maximum resident set size: 305,954,816 bytes`)
- **Wall time**: 1.09 real seconds
- **Note**: No pre-M baseline captured; this is the forward anchor. M.1+M.3 impact on layout pipeline RSS requires a separate run with `use_layout_for_markdown=true` on a multi-page PDF.
## Stopping conditions
Per `profiling.md` § Iteration protocol:
- **Three consecutive REJECT or MARGINAL** → optimization curve flattened; stop.
- **Aggregate p95 plaintext ms/MB within 20% of pandoc** → competitive ceiling reached; stop.
- **SF1 regression > 0.1pt on any iteration** → immediate revert; reset the streak.