This commit is contained in:
52
docs/migration/v5.0-image-indices.md
Normal file
52
docs/migration/v5.0-image-indices.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# Image Index References (v5.0)
|
||||
|
||||
## Summary
|
||||
|
||||
`PageContent.images: Vec<Arc<ExtractedImage>>` is removed. Pages now carry `image_indices: Vec<u32>` — zero-based indices into `ExtractionResult.images`.
|
||||
|
||||
## Breaking Change
|
||||
|
||||
**Previous behavior** (v4.x):
|
||||
|
||||
```rust
|
||||
let result = extractor.extract(path, &config).await?;
|
||||
for page in result.pages.unwrap_or_default() {
|
||||
for image in &page.images {
|
||||
println!("{:?}", image.data);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**New behavior** (v5.0):
|
||||
|
||||
```rust
|
||||
let result = extractor.extract(path, &config).await?;
|
||||
let images = result.images.as_deref().unwrap_or(&[]);
|
||||
for page in result.pages.unwrap_or_default() {
|
||||
for &idx in &page.image_indices {
|
||||
println!("{:?}", images[idx as usize].data);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`ChunkMetadata` gains the same `image_indices: Vec<u32>` field, populated post-chunking by matching each image's `page_number` against `[first_page, last_page]`.
|
||||
|
||||
## Impact
|
||||
|
||||
**Who is affected?**
|
||||
|
||||
- Users reading `page.images` directly
|
||||
- Users passing `PageContent` values across FFI boundaries
|
||||
- All polyglot bindings (Python, TypeScript, Ruby, PHP, Go, Java, C#, Elixir) — bindings are regenerated automatically
|
||||
|
||||
**What changes?**
|
||||
|
||||
| Before | After |
|
||||
| ------------------------ | ------------------------------------------------------------- |
|
||||
| `page.images[i].data` | `result.images.unwrap()[page.image_indices[i] as usize].data` |
|
||||
| `page.images.len()` | `page.image_indices.len()` |
|
||||
| `page.images.is_empty()` | `page.image_indices.is_empty()` |
|
||||
|
||||
## Known Limitation
|
||||
|
||||
`YamlSectionChunker` does not track page provenance (`first_page`/`last_page` are always `None`), so its chunks always produce empty `image_indices`. Tracked in a separate issue.
|
||||
Reference in New Issue
Block a user