Nomad changes

2026-06-01 23:40:55 +02:00
parent 72b1a0a6ed
commit b4c07d3693
5723 changed files with 1130655 additions and 0 deletions
--- a/tools/benchmark-harness/SCHEMA.md
+++ b/tools/benchmark-harness/SCHEMA.md
@@ -0,0 +1,266 @@
+# Aggregation Schema v2.4.0
+
+This document describes the structure of `aggregated.json` produced by `benchmark-harness consolidate`.
+
+## Top-level Shape
+
+```json
+{
+  "schema_version": "2.4.0",
+  "by_framework_mode": {
+    "<aggregate_key>": {
+      /* FrameworkModeAggregation */
+    }
+  },
+  "disk_sizes": {
+    "framework": {
+      /* DiskSizeInfo */
+    }
+  },
+  "comparison": {
+    /* ComparisonData */
+  },
+  "per_fixture_results": [
+    /* PerFixtureRow[] */
+  ],
+  "metadata": {
+    /* ConsolidationMetadata */
+  }
+}
+```
+
+## Output Format Discriminator
+
+The `output_format` field determines:
+
+- **`markdown`**: Supports all metrics including SF1 (structural F1), layout percentiles, and all ranking tables
+- **`plaintext`**: Text-only extraction; SF1 and layout percentiles are `null`; plaintext frameworks never appear in SF1 rankings
+
+## by_framework_mode
+
+Key format differs by framework family:
+
+- **kreuzberg** (`kreuzberg-*`): `{framework_name}:{mode}` — the output format is already encoded
+  in the framework name (e.g. `kreuzberg-markdown-baseline`), so repeating it in the key is
+  redundant.
+- **competitors** (all other frameworks): `{framework}:{output_format}:{mode}` — format is not
+  encoded in the name, so the key carries it explicitly.
+
+Examples:
+
+- `kreuzberg-markdown-baseline:single`
+- `kreuzberg-plaintext-paddle-ocr:batch`
+- `pdfplumber:plaintext:single`
+- `docling:markdown:single`
+
+Each entry contains:
+
+```json
+{
+  "framework": "string", // Framework name without mode suffix
+  "output_format": "markdown|plaintext", // Output format used
+  "mode": "single|batch|...", // Execution mode
+  "cold_start": {
+    /* DurationPercentiles */
+  }, // Optional, if cold start data available
+  "by_file_type": {
+    "pdf": {
+      "file_type": "pdf",
+      "no_ocr": {
+        /* PerformancePercentiles */
+      },
+      "with_ocr": {
+        /* PerformancePercentiles */
+      }
+    }
+  }
+}
+```
+
+## PerformancePercentiles
+
+Contains p50, p95, p99 for all metrics:
+
+```json
+{
+  "successful_sample_count": 42,
+  "total_sample_count": 50,
+  "framework_errors": 0,
+  "harness_errors": 5,
+  "timeouts": 3,
+  "empty_content": 0,
+  "error_details": {
+    "error message": 2
+  },
+  "duration": { "p50": 100.5, "p95": 150.2, "p99": 199.9 },
+  "throughput": { "p50": 5.2, "p95": 4.8, "p99": 3.1 },
+  "memory": { "p50": 150.0, "p95": 200.0, "p99": 250.0 },
+  "cpu": { "p50": 50.0, "p95": 75.0, "p99": 90.0 }, // Optional
+  "extraction_duration": { "p50": 80.0, "p95": 120.0, "p99": 160.0 }, // Optional
+  "quality": {
+    /* QualityPercentiles */
+  }, // Optional, if quality data available
+  "success_rate_percent": 84.0
+}
+```
+
+## QualityPercentiles
+
+Includes p50, p95, p99 for all F1 metrics. Layout percentiles are `null` for plaintext-only frameworks:
+
+```json
+{
+  "f1_text_p50": 0.92,
+  "f1_text_p95": 0.88,
+  "f1_text_p99": 0.75,
+  "f1_numeric_p50": 0.85,
+  "f1_numeric_p95": 0.8,
+  "f1_numeric_p99": 0.7,
+  "f1_layout_p50": 0.78, // null for plaintext output format
+  "f1_layout_p95": 0.72, // null for plaintext output format
+  "f1_layout_p99": 0.65, // null for plaintext output format
+  "quality_score_p50": 0.85,
+  "quality_score_p95": 0.8,
+  "quality_score_p99": 0.7
+}
+```
+
+## PerFixtureRow
+
+One row per unique combination of (framework, output_format, execution_mode, fixture_id, ocr):
+
+```json
+{
+  "framework": "kreuzberg-markdown-baseline",
+  "output_format": "markdown",
+  "execution_mode": "single",
+  "ocr": false,
+  "fixture_id": "sample_doc_1",
+  "file_type": "pdf",
+  "duration_ms": 125.4,
+  "peak_memory_mb": 180.5,
+  "f1_text": 0.92,
+  "f1_layout": 0.78, // null for plaintext mode
+  "f1_numeric": 0.85,
+  "quality_score": 0.85,
+  "correct": true,
+  "success": true,
+  "error_kind": null // "FrameworkError", "HarnessError", "Timeout", etc. if !success
+}
+```
+
+## ComparisonData
+
+Contains all cross-framework rankings split by output format for quality metrics:
+
+```json
+{
+  "performance_ranking": [
+    /* RankedFramework[] */
+  ],
+  "throughput_ranking": [
+    /* RankedFramework[] */
+  ],
+  "memory_ranking": [
+    /* RankedFramework[] */
+  ],
+  "cpu_ranking": [
+    /* RankedFramework[] */
+  ],
+  "quality_ranking": [
+    /* RankedFramework[] */
+  ],
+  "pdf_quality_ranking": [
+    /* RankedFramework[] */
+  ],
+  "pdf_tf1_ranking_markdown": [
+    /* RankedFramework[] — markdown-only */
+  ],
+  "pdf_tf1_ranking_plaintext": [
+    /* RankedFramework[] — plaintext-only */
+  ],
+  "pdf_sf1_ranking_markdown": [
+    /* RankedFramework[] — markdown-only, never plaintext */
+  ],
+  "deltas_vs_baseline": {
+    "<aggregate_key>": {
+      /* DeltaMetrics */
+    }
+  }
+}
+```
+
+### RankedFramework
+
+```json
+{
+  "framework_mode": "kreuzberg-markdown-baseline:single",
+  "rank": 1,
+  "value": 95.5, // The metric value (duration, throughput, etc.)
+  "relative": 1.0 // Ratio relative to best (1.0 = best)
+}
+```
+
+## Migration from v2.3.0 to v2.4.0
+
+### Breaking Changes
+
+1. **Schema version**: Bumped to `"2.4.0"`
+2. **Kreuzberg aggregate key format**: Changed from `framework:output_format:mode` to
+   `framework_name:mode` for all `kreuzberg-*` frameworks. Competitor key format
+   (`framework:output_format:mode`) is unchanged.
+
+### Kreuzberg Consolidation
+
+Language-binding frameworks (`kreuzberg-py`, `kreuzberg-node`, `kreuzberg-rb`, `kreuzberg-go`,
+`kreuzberg-java`, `kreuzberg-csharp`, `kreuzberg-elixir`, `kreuzberg-php`, `kreuzberg-rust`, etc.)
+have been removed. They are replaced by three native pipelines run directly via the kreuzberg CLI:
+
+| Pipeline  | Markdown name                   | Plaintext name                   |
+| --------- | ------------------------------- | -------------------------------- |
+| Baseline  | `kreuzberg-markdown-baseline`   | `kreuzberg-plaintext-baseline`   |
+| Layout    | `kreuzberg-markdown-layout`     | `kreuzberg-plaintext-layout`     |
+| PaddleOCR | `kreuzberg-markdown-paddle-ocr` | `kreuzberg-plaintext-paddle-ocr` |
+
+Batch variants append `-batch` to the framework name (e.g. `kreuzberg-markdown-baseline-batch`),
+which the harness normalises to aggregate key `kreuzberg-markdown-baseline:batch`.
+
+### Key Format Rationale
+
+The format component is implicit in the kreuzberg framework name itself. Duplicating it in the
+aggregate key (`kreuzberg-markdown-baseline:markdown:single`) would be redundant and confusing.
+Competitor names carry no format information, so they continue to need it in the key
+(`docling:markdown:single`).
+
+## Migration from v2.2.0 to v2.3.0
+
+### Breaking Changes
+
+1. **Schema version**: Bumped to `"2.3.0"`
+2. **Framework key format**: Changed from `framework:mode` to `framework:output_format:mode`
+3. **QualityPercentiles**: Added p95 and p99 percentiles for all F1 metrics; `f1_layout_*` fields are now optional (null for plaintext)
+4. **FrameworkModeAggregation**: Added `output_format` field
+5. **ComparisonData**: Replaced `pdf_tf1_ranking` with `pdf_tf1_ranking_markdown` and `pdf_tf1_ranking_plaintext`; `pdf_sf1_ranking` renamed to `pdf_sf1_ranking_markdown` (now markdown-only)
+
+### New Fields
+
+- `per_fixture_results`: Array of individual fixture results preserving per-file measurements
+- `PerFixtureRow`: New struct capturing individual extraction outcomes
+
+### Plaintext-only Behavior
+
+- Plaintext frameworks NEVER appear in `pdf_sf1_ranking_markdown`
+- Plaintext frameworks NEVER appear in `pdf_tf1_ranking_markdown` (they get their own `pdf_tf1_ranking_plaintext`)
+- SF1 and layout percentiles are `null` for plaintext output format
+- All performance rankings (speed, memory, throughput) include both formats without discrimination
+
+## ConsolidationMetadata
+
+```json
+{
+  "total_results": 500,
+  "framework_count": 5,
+  "file_type_count": 8,
+  "timestamp": "2025-05-09T10:15:30Z"
+}
+```