fil/docs/perf/iterations.md

# Performance Iteration Log

Per-iteration tracker for Kreuzberg perf optimization rounds. Append one row per accepted or rejected candidate. Follows the protocol in `profiling.md`. The institutional memory replacement for the `feedback_perf_subagent_verification.md` failure-mode warning.

## Format

| commit | candidate | hotspot self-time | p50 Δ | p95 Δ | SF1 Δ | verdict | notes |
| ------ | --------- | ----------------- | ----- | ----- | ----- | ------- | ----- |

- **commit** — short SHA of the optimization commit (or REVERTED if rejected).
- **candidate** — file:function being optimized.
- **hotspot self-time** — pre-fix percentage from the flamegraph.
- **p50 Δ / p95 Δ** — change in median/tail extraction time vs prior baseline JSON.
- **SF1 Δ** — change in aggregate SF1 score. Must be ≥ −0.1pt or revert.
- **verdict** — ACCEPT, MARGINAL (lift < 3%, no regression), or REJECT.
- **notes** — one sentence on why.

## History

| commit    | candidate                                                                                                  | hotspot self-time                             | p50 Δ | p95 Δ | SF1 Δ | verdict | notes                                                                                                                                                                                                         |
| --------- | ---------------------------------------------------------------------------------------------------------- | --------------------------------------------- | ----- | ----- | ----- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| REVERTED  | normalize_whitespace rewrite                                                                               | not measured                                  | n/a   | n/a   | n/a   | REJECT  | perf-engineer agent ACCEPT'd without measurement; correctness regression on leading/trailing spaces. See `feedback_perf_subagent_verification.md`.                                                            |
| REVERTED  | split_embedded bullet-count fast path                                                                      | not measured                                  | +5%   | +5%   | 0     | REJECT  | speculation-driven; ~5% wall-time _regression_. Don't optimize without a flamegraph.                                                                                                                          |
| b51472c1c | layout_runner: stream DynamicImage→RgbImage + drop redundant clones in table_recognition/layout_validation | n/a (memory, not CPU)                         | n/a   | n/a   | 0     | ACCEPT  | No pre-M baseline; post-M anchor: 292 MB peak RSS on 60 MB PDF (plain, no layout). Q gates: 143/143 regression, 3/3 smoke, 18/18 guardrail failures identical to pre-M (all pre-existing pdf_oxide upstream). |
| 86a706959 | rendering::markdown: Cow single-pass scans replacing 6 eager .replace() chains                             | 0.02% self-time post-M (flamegraph fa356cb7e) | n/a   | n/a   | 0     | ACCEPT  | M.2 confirmed effective — render_markdown dropped to 0.02% in post-M flamegraph; was queue candidate #2.                                                                                                      |

## Queue — CLEARED (stopping condition met)

**Flamegraph `flamegraphs/fa356cb7e/baseline.svg`** (2026-05-11, 88,524 samples, `--profile profiling`, `--features all`):

| rank | self-time | function                                                                        |
| ---- | --------- | ------------------------------------------------------------------------------- |
| 1    | 0.50%     | `kreuzberg::pdf::oxide::table::extract_tables_native`                           |
| 2    | 0.33%     | `kreuzberg::pdf::oxide::text::extract_text_fast_path`                           |
| 3    | 0.14%     | `kreuzberg::pdf::oxide::hierarchy::extract_all_segments`                        |
| 4    | 0.12%     | `kreuzberg::cache::core::GenericCache::set`                                     |
| 5    | 0.11%     | `kreuzberg::pdf::oxide::images::extract_image_positions`                        |
| 6    | 0.11%     | `kreuzberg::pdf::structure::pipeline::extract_document_structure_from_segments` |
| 7    | 0.09%     | `kreuzberg::cache::cleanup::scan_cache_directory`                               |
| 8    | 0.08%     | `kreuzberg::pdf::structure::classify::mark_arxiv_noise`                         |
| 9    | 0.02%     | `kreuzberg::rendering::markdown::render_markdown`                               |

**Breakdown by crate (aggregate, 88,524 total samples):**

- System/other: 48.4%
- Std/core/alloc: 25.3%
- Benchmark_harness (quality scorer): 9.9%
- Pdf_oxide: 9.3%
- Rayon: 3.7%
- **Kreuzberg: 3.4%**
- Tokio: 0.1%

**Stopping condition:** Kreuzberg layer accounts for only 3.4% of total wall time. The previous queue candidates (fuse_paragraphs, text_repair, normalize_key, classify::merge_consecutive_pages) do not appear in the top-25 Kreuzberg frames — confirmed not hot on the baseline pipeline path. Dominant cost is pdf_oxide text/table extraction (9.3%) + system allocator + OS overhead (48.4%) — these are outside kreuzberg's optimization surface.

**Further kreuzberg-layer CPU gains require upstream pdf_oxide work** (table extraction at 0.50% is the single largest kreuzberg-visible hotspot; it delegates to pdf_oxide). Cache I/O (scan_cache_directory) at 0.09% is the next actionable target if cache efficiency becomes a priority, but it's below the noise floor for extraction pipelines.

### Previous blockers resolved

- Symbol-strip blocker from `flamegraphs/61170f7f6/baseline.svg` is fixed: `.task/workflows/benchmark.yml` patched from `--features full` → `--features all` (kreuzberg-cli has no `full` feature). The `fa356cb7e` flamegraph has 87 Kreuzberg symbols resolved.

### Post-M RSS anchor

Measured 2026-05-11 on `target/profiling/kreuzberg` (post-M, `--features all`, `--profile profiling`):

- **Fixture**: `test_documents/pdf/proof_of_concept_or_gtfo_v13_october_18th_2016.pdf` (60 MB, plain extraction, no layout detection)
- **Peak RSS**: 292 MB (`maximum resident set size: 305,954,816 bytes`)
- **Wall time**: 1.09 real seconds
- **Note**: No pre-M baseline captured; this is the forward anchor. M.1+M.3 impact on layout pipeline RSS requires a separate run with `use_layout_for_markdown=true` on a multi-page PDF.

## Stopping conditions

Per `profiling.md` § Iteration protocol:

- **Three consecutive REJECT or MARGINAL** → optimization curve flattened; stop.
- **Aggregate p95 plaintext ms/MB within 20% of pandoc** → competitive ceiling reached; stop.
- **SF1 regression > 0.1pt on any iteration** → immediate revert; reset the streak.