This commit is contained in:
644
docs/migration/v4.0-fonts.md
Normal file
644
docs/migration/v4.0-fonts.md
Normal file
@@ -0,0 +1,644 @@
|
||||
# Font Configuration Breaking Change (v4.0)
|
||||
|
||||
## Summary
|
||||
|
||||
Custom font provider is now **enabled by default** for improved PDF performance.
|
||||
|
||||
## Breaking Change
|
||||
|
||||
**Previous behavior** (v3.x):
|
||||
|
||||
- Font provider always enabled, not configurable
|
||||
- Used system fonts only
|
||||
- No user control over font loading
|
||||
|
||||
**New behavior** (v4.0):
|
||||
|
||||
- Font provider enabled by default
|
||||
- Configurable via `FontConfig` in `PdfConfig`
|
||||
- Can disable or add custom font directories
|
||||
- ~12-13% faster PDF processing with font caching
|
||||
|
||||
## Impact
|
||||
|
||||
**Who is affected?**
|
||||
|
||||
- Users who rely on the PDF extractor's default font fallback behavior
|
||||
- Users who want to disable the custom font provider
|
||||
- Users who need to add custom font directories
|
||||
|
||||
**What changes?**
|
||||
|
||||
- Default: Custom font provider now active (breaking change)
|
||||
- Performance: PDF extraction 12-13% faster
|
||||
- API: New `font_config` option in `PdfConfig`
|
||||
|
||||
## Migration
|
||||
|
||||
### No Action Required (Recommended)
|
||||
|
||||
For most users, no changes needed. Default behavior provides performance improvements:
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust
|
||||
use kreuzberg::ExtractionConfig;
|
||||
|
||||
// Previous (v4.0) - no font configuration
|
||||
let config = ExtractionConfig::default();
|
||||
|
||||
// Current (v4.0) - same code, now with font provider enabled
|
||||
let config = ExtractionConfig::default();
|
||||
// Font provider automatically enabled with system fonts
|
||||
```
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from kreuzberg import ExtractionConfig
|
||||
|
||||
# Previous (v4.0)
|
||||
config = ExtractionConfig()
|
||||
|
||||
# Current (v4.0) - same code, now with font provider enabled
|
||||
config = ExtractionConfig()
|
||||
# Font provider automatically enabled with system fonts
|
||||
```
|
||||
|
||||
=== "TypeScript"
|
||||
|
||||
```typescript
|
||||
import { ExtractionConfig } from 'kreuzberg';
|
||||
|
||||
// Previous (v4.0)
|
||||
const config: ExtractionConfig = {};
|
||||
|
||||
// Current (v4.0) - same code, now with font provider enabled
|
||||
const config: ExtractionConfig = {};
|
||||
// Font provider automatically enabled with system fonts
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java
|
||||
import dev.kreuzberg.config.*;
|
||||
|
||||
// Previous (v4.0)
|
||||
ExtractionConfig config = ExtractionConfig.builder().build();
|
||||
|
||||
// Current (v4.0) - same code, now with font provider enabled
|
||||
ExtractionConfig config = ExtractionConfig.builder().build();
|
||||
// Font provider automatically enabled with system fonts
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go
|
||||
import "github.com/kreuzberg-dev/kreuzberg/v4"
|
||||
|
||||
// Previous (v4.0)
|
||||
config := &kreuzberg.ExtractionConfig{}
|
||||
|
||||
// Current (v4.0) - same code, now with font provider enabled
|
||||
config := &kreuzberg.ExtractionConfig{}
|
||||
// Font provider automatically enabled with system fonts
|
||||
```
|
||||
|
||||
=== "Ruby"
|
||||
|
||||
```ruby
|
||||
require 'kreuzberg'
|
||||
|
||||
# Previous (v4.0)
|
||||
config = Kreuzberg::ExtractionConfig.new
|
||||
|
||||
# Current (v4.0) - same code, now with font provider enabled
|
||||
config = Kreuzberg::ExtractionConfig.new
|
||||
# Font provider automatically enabled with system fonts
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp
|
||||
using Kreuzberg;
|
||||
|
||||
// Previous (v4.0)
|
||||
var config = new ExtractionConfig();
|
||||
|
||||
// Current (v4.0) - same code, now with font provider enabled
|
||||
var config = new ExtractionConfig();
|
||||
// Font provider automatically enabled with system fonts
|
||||
```
|
||||
|
||||
### Disable Font Provider
|
||||
|
||||
If you prefer the default font handling:
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust
|
||||
use kreuzberg::{ExtractionConfig, PdfConfig, FontConfig};
|
||||
|
||||
let config = ExtractionConfig {
|
||||
pdf_options: Some(PdfConfig {
|
||||
font_config: Some(FontConfig {
|
||||
enabled: false,
|
||||
custom_font_dirs: None,
|
||||
}),
|
||||
..Default::default()
|
||||
}),
|
||||
..Default::default()
|
||||
};
|
||||
```
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from kreuzberg import ExtractionConfig, PdfConfig, FontConfig
|
||||
|
||||
config = ExtractionConfig(
|
||||
pdf_options=PdfConfig(
|
||||
font_config=FontConfig(enabled=False)
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
=== "TypeScript"
|
||||
|
||||
```typescript
|
||||
import { ExtractionConfig } from 'kreuzberg';
|
||||
|
||||
const config: ExtractionConfig = {
|
||||
pdfOptions: {
|
||||
fontConfig: {
|
||||
enabled: false
|
||||
}
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java
|
||||
import dev.kreuzberg.config.*;
|
||||
|
||||
FontConfig fontConfig = FontConfig.builder()
|
||||
.enabled(false)
|
||||
.build();
|
||||
|
||||
PdfConfig pdfConfig = PdfConfig.builder()
|
||||
.fontConfig(fontConfig)
|
||||
.build();
|
||||
|
||||
ExtractionConfig config = ExtractionConfig.builder()
|
||||
.pdfOptions(pdfConfig)
|
||||
.build();
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go
|
||||
import "github.com/kreuzberg-dev/kreuzberg/v4"
|
||||
|
||||
config := &kreuzberg.ExtractionConfig{
|
||||
PdfOptions: &kreuzberg.PdfConfig{
|
||||
FontConfig: &kreuzberg.FontConfig{
|
||||
Enabled: false,
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
=== "Ruby"
|
||||
|
||||
```ruby
|
||||
require 'kreuzberg'
|
||||
|
||||
config = Kreuzberg::ExtractionConfig.new(
|
||||
pdf_options: Kreuzberg::PdfConfig.new(
|
||||
font_config: Kreuzberg::FontConfig.new(enabled: false)
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp
|
||||
using Kreuzberg;
|
||||
|
||||
var fontConfig = new FontConfig { Enabled = false };
|
||||
var pdfConfig = new PdfConfig { FontConfig = fontConfig };
|
||||
var config = new ExtractionConfig { PdfOptions = pdfConfig };
|
||||
```
|
||||
|
||||
### Add Custom Font Directories
|
||||
|
||||
To use fonts from custom directories (in addition system fonts):
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust
|
||||
use kreuzberg::{ExtractionConfig, PdfConfig, FontConfig};
|
||||
use std::path::PathBuf;
|
||||
|
||||
let config = ExtractionConfig {
|
||||
pdf_options: Some(PdfConfig {
|
||||
font_config: Some(FontConfig {
|
||||
enabled: true,
|
||||
custom_font_dirs: Some(vec![
|
||||
PathBuf::from("/usr/share/fonts/custom"),
|
||||
PathBuf::from("~/my-fonts"), // Tilde expanded automatically
|
||||
]),
|
||||
}),
|
||||
..Default::default()
|
||||
}),
|
||||
..Default::default()
|
||||
};
|
||||
```
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from kreuzberg import ExtractionConfig, PdfConfig, FontConfig
|
||||
|
||||
config = ExtractionConfig(
|
||||
pdf_options=PdfConfig(
|
||||
font_config=FontConfig(
|
||||
enabled=True,
|
||||
custom_font_dirs=[
|
||||
"/usr/share/fonts/custom",
|
||||
"~/my-fonts" # Tilde expanded automatically
|
||||
]
|
||||
)
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
=== "TypeScript"
|
||||
|
||||
```typescript
|
||||
import { ExtractionConfig } from 'kreuzberg';
|
||||
|
||||
const config: ExtractionConfig = {
|
||||
pdfOptions: {
|
||||
fontConfig: {
|
||||
enabled: true,
|
||||
customFontDirs: [
|
||||
'/usr/share/fonts/custom',
|
||||
'~/my-fonts' // Tilde expanded automatically
|
||||
]
|
||||
}
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java
|
||||
import dev.kreuzberg.config.*;
|
||||
import java.nio.file.Paths;
|
||||
|
||||
FontConfig fontConfig = FontConfig.builder()
|
||||
.enabled(true)
|
||||
.customFontDirs(Arrays.asList(
|
||||
Paths.get("/usr/share/fonts/custom"),
|
||||
Paths.get("~/my-fonts") // Tilde expanded automatically
|
||||
))
|
||||
.build();
|
||||
|
||||
PdfConfig pdfConfig = PdfConfig.builder()
|
||||
.fontConfig(fontConfig)
|
||||
.build();
|
||||
|
||||
ExtractionConfig config = ExtractionConfig.builder()
|
||||
.pdfOptions(pdfConfig)
|
||||
.build();
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go
|
||||
import "github.com/kreuzberg-dev/kreuzberg/v4"
|
||||
|
||||
config := &kreuzberg.ExtractionConfig{
|
||||
PdfOptions: &kreuzberg.PdfConfig{
|
||||
FontConfig: &kreuzberg.FontConfig{
|
||||
Enabled: true,
|
||||
CustomFontDirs: []string{
|
||||
"/usr/share/fonts/custom",
|
||||
"~/my-fonts", // Tilde expanded automatically
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
=== "Ruby"
|
||||
|
||||
```ruby
|
||||
require 'kreuzberg'
|
||||
|
||||
config = Kreuzberg::ExtractionConfig.new(
|
||||
pdf_options: Kreuzberg::PdfConfig.new(
|
||||
font_config: Kreuzberg::FontConfig.new(
|
||||
enabled: true,
|
||||
custom_font_dirs: [
|
||||
'/usr/share/fonts/custom',
|
||||
'~/my-fonts' # Tilde expanded automatically
|
||||
]
|
||||
)
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp
|
||||
using Kreuzberg;
|
||||
|
||||
var fontConfig = new FontConfig
|
||||
{
|
||||
Enabled = true,
|
||||
CustomFontDirs = new[]
|
||||
{
|
||||
"/usr/share/fonts/custom",
|
||||
"~/my-fonts" // Tilde expanded automatically
|
||||
}
|
||||
};
|
||||
|
||||
var pdfConfig = new PdfConfig { FontConfig = fontConfig };
|
||||
var config = new ExtractionConfig { PdfOptions = pdfConfig };
|
||||
```
|
||||
|
||||
## Configuration Files
|
||||
|
||||
### TOML Format
|
||||
|
||||
```toml title="Font Configuration in TOML"
|
||||
[pdf_options.font_config]
|
||||
enabled = true
|
||||
custom_font_dirs = ["/usr/share/fonts/custom", "~/my-fonts"]
|
||||
```
|
||||
|
||||
### YAML Format
|
||||
|
||||
```yaml title="Font Configuration in YAML"
|
||||
pdf_options:
|
||||
font_config:
|
||||
enabled: true
|
||||
custom_font_dirs:
|
||||
- /usr/share/fonts/custom
|
||||
- ~/my-fonts
|
||||
```
|
||||
|
||||
### JSON Format
|
||||
|
||||
```json title="Font Configuration in JSON"
|
||||
{
|
||||
"pdf_options": {
|
||||
"font_config": {
|
||||
"enabled": true,
|
||||
"custom_font_dirs": ["/usr/share/fonts/custom", "~/my-fonts"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Path Handling
|
||||
|
||||
The font configuration automatically handles:
|
||||
|
||||
- **Tilde expansion**: `~/fonts` → `/Users/username/fonts`
|
||||
- **Relative paths**: `./fonts` → `/absolute/path/to/fonts`
|
||||
- **Symlinks**: Resolved to canonical paths (security measure)
|
||||
- **Validation**: Directories must exist; warnings logged if not found
|
||||
- **Graceful degradation**: Missing directories don't cause failures
|
||||
|
||||
## Global Configuration
|
||||
|
||||
**Important**: Font configuration is global per process and must be set **before the first PDF extraction**.
|
||||
|
||||
=== "Rust"
|
||||
|
||||
```rust
|
||||
// CORRECT: Set config before first extraction
|
||||
let config = ExtractionConfig {
|
||||
pdf_options: Some(PdfConfig {
|
||||
font_config: Some(FontConfig {
|
||||
enabled: true,
|
||||
custom_font_dirs: Some(vec![
|
||||
PathBuf::from("/usr/share/fonts/custom"),
|
||||
]),
|
||||
}),
|
||||
..Default::default()
|
||||
}),
|
||||
..Default::default()
|
||||
};
|
||||
|
||||
let result = kreuzberg::extract_file("document.pdf", &config)?;
|
||||
|
||||
// INCORRECT: Attempting to change config after first extraction
|
||||
let new_config = ExtractionConfig {
|
||||
pdf_options: Some(PdfConfig {
|
||||
font_config: Some(FontConfig {
|
||||
enabled: false,
|
||||
custom_font_dirs: None,
|
||||
}),
|
||||
..Default::default()
|
||||
}),
|
||||
..Default::default()
|
||||
};
|
||||
let result2 = kreuzberg::extract_file("document2.pdf", &new_config)?;
|
||||
// Warning logged: "Font config already initialized"
|
||||
```
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
# CORRECT: Set config before first extraction
|
||||
config = ExtractionConfig(
|
||||
pdf_options=PdfConfig(
|
||||
font_config=FontConfig(
|
||||
enabled=True,
|
||||
custom_font_dirs=["/usr/share/fonts/custom"]
|
||||
)
|
||||
)
|
||||
)
|
||||
result = extract_file("document.pdf", config)
|
||||
|
||||
# INCORRECT: Attempting to change config after first extraction
|
||||
new_config = ExtractionConfig(
|
||||
pdf_options=PdfConfig(
|
||||
font_config=FontConfig(enabled=False)
|
||||
)
|
||||
)
|
||||
result2 = extract_file("document2.pdf", new_config)
|
||||
# Warning logged: "Font config already initialized"
|
||||
```
|
||||
|
||||
=== "TypeScript"
|
||||
|
||||
```typescript
|
||||
// CORRECT: Set config before first extraction
|
||||
const config: ExtractionConfig = {
|
||||
pdfOptions: {
|
||||
fontConfig: {
|
||||
enabled: true,
|
||||
customFontDirs: ['/usr/share/fonts/custom']
|
||||
}
|
||||
}
|
||||
};
|
||||
const result = await extractFile('document.pdf', config);
|
||||
|
||||
// INCORRECT: Attempting to change config after first extraction
|
||||
const newConfig: ExtractionConfig = {
|
||||
pdfOptions: {
|
||||
fontConfig: { enabled: false }
|
||||
}
|
||||
};
|
||||
const result2 = await extractFile('document2.pdf', newConfig);
|
||||
// Warning logged: "Font config already initialized"
|
||||
```
|
||||
|
||||
=== "Java"
|
||||
|
||||
```java
|
||||
// CORRECT: Set config before first extraction
|
||||
FontConfig fontConfig = FontConfig.builder()
|
||||
.enabled(true)
|
||||
.customFontDirs(Arrays.asList(Paths.get("/usr/share/fonts/custom")))
|
||||
.build();
|
||||
PdfConfig pdfConfig = PdfConfig.builder()
|
||||
.fontConfig(fontConfig)
|
||||
.build();
|
||||
ExtractionConfig config = ExtractionConfig.builder()
|
||||
.pdfOptions(pdfConfig)
|
||||
.build();
|
||||
ExtractionResult result = Kreuzberg.extractFile("document.pdf", config);
|
||||
|
||||
// INCORRECT: Attempting to change config after first extraction
|
||||
FontConfig newFontConfig = FontConfig.builder()
|
||||
.enabled(false)
|
||||
.build();
|
||||
PdfConfig newPdfConfig = PdfConfig.builder()
|
||||
.fontConfig(newFontConfig)
|
||||
.build();
|
||||
ExtractionConfig newConfig = ExtractionConfig.builder()
|
||||
.pdfOptions(newPdfConfig)
|
||||
.build();
|
||||
ExtractionResult result2 = Kreuzberg.extractFile("document2.pdf", newConfig);
|
||||
// Warning logged: "Font config already initialized"
|
||||
```
|
||||
|
||||
=== "Go"
|
||||
|
||||
```go
|
||||
// CORRECT: Set config before first extraction
|
||||
config := &kreuzberg.ExtractionConfig{
|
||||
PdfOptions: &kreuzberg.PdfConfig{
|
||||
FontConfig: &kreuzberg.FontConfig{
|
||||
Enabled: true,
|
||||
CustomFontDirs: []string{"/usr/share/fonts/custom"},
|
||||
},
|
||||
},
|
||||
}
|
||||
result, _ := kreuzberg.ExtractFile("document.pdf", config)
|
||||
|
||||
// INCORRECT: Attempting to change config after first extraction
|
||||
newConfig := &kreuzberg.ExtractionConfig{
|
||||
PdfOptions: &kreuzberg.PdfConfig{
|
||||
FontConfig: &kreuzberg.FontConfig{
|
||||
Enabled: false,
|
||||
},
|
||||
},
|
||||
}
|
||||
result2, _ := kreuzberg.ExtractFile("document2.pdf", newConfig)
|
||||
// Warning logged: "Font config already initialized"
|
||||
```
|
||||
|
||||
=== "Ruby"
|
||||
|
||||
```ruby
|
||||
# CORRECT: Set config before first extraction
|
||||
config = Kreuzberg::ExtractionConfig.new(
|
||||
pdf_options: Kreuzberg::PdfConfig.new(
|
||||
font_config: Kreuzberg::FontConfig.new(
|
||||
enabled: true,
|
||||
custom_font_dirs: ['/usr/share/fonts/custom']
|
||||
)
|
||||
)
|
||||
)
|
||||
result = Kreuzberg.extract_file('document.pdf', config)
|
||||
|
||||
# INCORRECT: Attempting to change config after first extraction
|
||||
new_config = Kreuzberg::ExtractionConfig.new(
|
||||
pdf_options: Kreuzberg::PdfConfig.new(
|
||||
font_config: Kreuzberg::FontConfig.new(enabled: false)
|
||||
)
|
||||
)
|
||||
result2 = Kreuzberg.extract_file('document2.pdf', new_config)
|
||||
# Warning logged: "Font config already initialized"
|
||||
```
|
||||
|
||||
=== "C#"
|
||||
|
||||
```csharp
|
||||
// CORRECT: Set config before first extraction
|
||||
var fontConfig = new FontConfig
|
||||
{
|
||||
Enabled = true,
|
||||
CustomFontDirs = new[] { "/usr/share/fonts/custom" }
|
||||
};
|
||||
var pdfConfig = new PdfConfig { FontConfig = fontConfig };
|
||||
var config = new ExtractionConfig { PdfOptions = pdfConfig };
|
||||
var result = Kreuzberg.ExtractFile("document.pdf", config);
|
||||
|
||||
// INCORRECT: Attempting to change config after first extraction
|
||||
var newFontConfig = new FontConfig { Enabled = false };
|
||||
var newPdfConfig = new PdfConfig { FontConfig = newFontConfig };
|
||||
var newConfig = new ExtractionConfig { PdfOptions = newPdfConfig };
|
||||
var result2 = Kreuzberg.ExtractFile("document2.pdf", newConfig);
|
||||
// Warning logged: "Font config already initialized"
|
||||
```
|
||||
|
||||
## Performance Impact
|
||||
|
||||
With default settings (enabled=true, system fonts):
|
||||
|
||||
- **PDF extraction**: ~12-13% faster
|
||||
- **Memory**: Minimal increase (~100KB for font cache)
|
||||
- **Startup**: Lazy initialization (no overhead for non-PDF workloads)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Custom fonts not working
|
||||
|
||||
**Symptom**: PDF still uses fallback fonts
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. Verify directories exist and contain .ttf/.otf/.ttc files
|
||||
2. Check logs for "Custom font directory not found" warnings
|
||||
3. Ensure paths are absolute or properly expanded
|
||||
4. Verify font files are readable
|
||||
|
||||
### "Font config already initialized" warning
|
||||
|
||||
**Symptom**: Configuration changes ignored after first PDF extraction
|
||||
|
||||
**Solution**: Set FontConfig in the **first** ExtractionConfig used. Subsequent config changes are not supported (global limitation).
|
||||
|
||||
### Performance regression
|
||||
|
||||
**Symptom**: PDF extraction slower after upgrade
|
||||
|
||||
**Solution**: This is unexpected. Please report as a bug with:
|
||||
|
||||
- PDF sample (if shareable)
|
||||
- Benchmark comparison (before/after)
|
||||
- Configuration used
|
||||
|
||||
## Questions?
|
||||
|
||||
- **Issue tracker**: <https://github.com/kreuzberg-dev/kreuzberg/issues>
|
||||
- **Discussions**: <https://github.com/kreuzberg-dev/kreuzberg/discussions>
|
||||
Reference in New Issue
Block a user