# Font Configuration Breaking Change (v4.0) ## Summary Custom font provider is now **enabled by default** for improved PDF performance. ## Breaking Change **Previous behavior** (v3.x): - Font provider always enabled, not configurable - Used system fonts only - No user control over font loading **New behavior** (v4.0): - Font provider enabled by default - Configurable via `FontConfig` in `PdfConfig` - Can disable or add custom font directories - ~12-13% faster PDF processing with font caching ## Impact **Who is affected?** - Users who rely on the PDF extractor's default font fallback behavior - Users who want to disable the custom font provider - Users who need to add custom font directories **What changes?** - Default: Custom font provider now active (breaking change) - Performance: PDF extraction 12-13% faster - API: New `font_config` option in `PdfConfig` ## Migration ### No Action Required (Recommended) For most users, no changes needed. Default behavior provides performance improvements: === "Rust" ```rust use kreuzberg::ExtractionConfig; // Previous (v4.0) - no font configuration let config = ExtractionConfig::default(); // Current (v4.0) - same code, now with font provider enabled let config = ExtractionConfig::default(); // Font provider automatically enabled with system fonts ``` === "Python" ```python from kreuzberg import ExtractionConfig # Previous (v4.0) config = ExtractionConfig() # Current (v4.0) - same code, now with font provider enabled config = ExtractionConfig() # Font provider automatically enabled with system fonts ``` === "TypeScript" ```typescript import { ExtractionConfig } from 'kreuzberg'; // Previous (v4.0) const config: ExtractionConfig = {}; // Current (v4.0) - same code, now with font provider enabled const config: ExtractionConfig = {}; // Font provider automatically enabled with system fonts ``` === "Java" ```java import dev.kreuzberg.config.*; // Previous (v4.0) ExtractionConfig config = ExtractionConfig.builder().build(); // Current (v4.0) - same code, now with font provider enabled ExtractionConfig config = ExtractionConfig.builder().build(); // Font provider automatically enabled with system fonts ``` === "Go" ```go import "github.com/kreuzberg-dev/kreuzberg/v4" // Previous (v4.0) config := &kreuzberg.ExtractionConfig{} // Current (v4.0) - same code, now with font provider enabled config := &kreuzberg.ExtractionConfig{} // Font provider automatically enabled with system fonts ``` === "Ruby" ```ruby require 'kreuzberg' # Previous (v4.0) config = Kreuzberg::ExtractionConfig.new # Current (v4.0) - same code, now with font provider enabled config = Kreuzberg::ExtractionConfig.new # Font provider automatically enabled with system fonts ``` === "C#" ```csharp using Kreuzberg; // Previous (v4.0) var config = new ExtractionConfig(); // Current (v4.0) - same code, now with font provider enabled var config = new ExtractionConfig(); // Font provider automatically enabled with system fonts ``` ### Disable Font Provider If you prefer the default font handling: === "Rust" ```rust use kreuzberg::{ExtractionConfig, PdfConfig, FontConfig}; let config = ExtractionConfig { pdf_options: Some(PdfConfig { font_config: Some(FontConfig { enabled: false, custom_font_dirs: None, }), ..Default::default() }), ..Default::default() }; ``` === "Python" ```python from kreuzberg import ExtractionConfig, PdfConfig, FontConfig config = ExtractionConfig( pdf_options=PdfConfig( font_config=FontConfig(enabled=False) ) ) ``` === "TypeScript" ```typescript import { ExtractionConfig } from 'kreuzberg'; const config: ExtractionConfig = { pdfOptions: { fontConfig: { enabled: false } } }; ``` === "Java" ```java import dev.kreuzberg.config.*; FontConfig fontConfig = FontConfig.builder() .enabled(false) .build(); PdfConfig pdfConfig = PdfConfig.builder() .fontConfig(fontConfig) .build(); ExtractionConfig config = ExtractionConfig.builder() .pdfOptions(pdfConfig) .build(); ``` === "Go" ```go import "github.com/kreuzberg-dev/kreuzberg/v4" config := &kreuzberg.ExtractionConfig{ PdfOptions: &kreuzberg.PdfConfig{ FontConfig: &kreuzberg.FontConfig{ Enabled: false, }, }, } ``` === "Ruby" ```ruby require 'kreuzberg' config = Kreuzberg::ExtractionConfig.new( pdf_options: Kreuzberg::PdfConfig.new( font_config: Kreuzberg::FontConfig.new(enabled: false) ) ) ``` === "C#" ```csharp using Kreuzberg; var fontConfig = new FontConfig { Enabled = false }; var pdfConfig = new PdfConfig { FontConfig = fontConfig }; var config = new ExtractionConfig { PdfOptions = pdfConfig }; ``` ### Add Custom Font Directories To use fonts from custom directories (in addition system fonts): === "Rust" ```rust use kreuzberg::{ExtractionConfig, PdfConfig, FontConfig}; use std::path::PathBuf; let config = ExtractionConfig { pdf_options: Some(PdfConfig { font_config: Some(FontConfig { enabled: true, custom_font_dirs: Some(vec![ PathBuf::from("/usr/share/fonts/custom"), PathBuf::from("~/my-fonts"), // Tilde expanded automatically ]), }), ..Default::default() }), ..Default::default() }; ``` === "Python" ```python from kreuzberg import ExtractionConfig, PdfConfig, FontConfig config = ExtractionConfig( pdf_options=PdfConfig( font_config=FontConfig( enabled=True, custom_font_dirs=[ "/usr/share/fonts/custom", "~/my-fonts" # Tilde expanded automatically ] ) ) ) ``` === "TypeScript" ```typescript import { ExtractionConfig } from 'kreuzberg'; const config: ExtractionConfig = { pdfOptions: { fontConfig: { enabled: true, customFontDirs: [ '/usr/share/fonts/custom', '~/my-fonts' // Tilde expanded automatically ] } } }; ``` === "Java" ```java import dev.kreuzberg.config.*; import java.nio.file.Paths; FontConfig fontConfig = FontConfig.builder() .enabled(true) .customFontDirs(Arrays.asList( Paths.get("/usr/share/fonts/custom"), Paths.get("~/my-fonts") // Tilde expanded automatically )) .build(); PdfConfig pdfConfig = PdfConfig.builder() .fontConfig(fontConfig) .build(); ExtractionConfig config = ExtractionConfig.builder() .pdfOptions(pdfConfig) .build(); ``` === "Go" ```go import "github.com/kreuzberg-dev/kreuzberg/v4" config := &kreuzberg.ExtractionConfig{ PdfOptions: &kreuzberg.PdfConfig{ FontConfig: &kreuzberg.FontConfig{ Enabled: true, CustomFontDirs: []string{ "/usr/share/fonts/custom", "~/my-fonts", // Tilde expanded automatically }, }, }, } ``` === "Ruby" ```ruby require 'kreuzberg' config = Kreuzberg::ExtractionConfig.new( pdf_options: Kreuzberg::PdfConfig.new( font_config: Kreuzberg::FontConfig.new( enabled: true, custom_font_dirs: [ '/usr/share/fonts/custom', '~/my-fonts' # Tilde expanded automatically ] ) ) ) ``` === "C#" ```csharp using Kreuzberg; var fontConfig = new FontConfig { Enabled = true, CustomFontDirs = new[] { "/usr/share/fonts/custom", "~/my-fonts" // Tilde expanded automatically } }; var pdfConfig = new PdfConfig { FontConfig = fontConfig }; var config = new ExtractionConfig { PdfOptions = pdfConfig }; ``` ## Configuration Files ### TOML Format ```toml title="Font Configuration in TOML" [pdf_options.font_config] enabled = true custom_font_dirs = ["/usr/share/fonts/custom", "~/my-fonts"] ``` ### YAML Format ```yaml title="Font Configuration in YAML" pdf_options: font_config: enabled: true custom_font_dirs: - /usr/share/fonts/custom - ~/my-fonts ``` ### JSON Format ```json title="Font Configuration in JSON" { "pdf_options": { "font_config": { "enabled": true, "custom_font_dirs": ["/usr/share/fonts/custom", "~/my-fonts"] } } } ``` ## Path Handling The font configuration automatically handles: - **Tilde expansion**: `~/fonts` → `/Users/username/fonts` - **Relative paths**: `./fonts` → `/absolute/path/to/fonts` - **Symlinks**: Resolved to canonical paths (security measure) - **Validation**: Directories must exist; warnings logged if not found - **Graceful degradation**: Missing directories don't cause failures ## Global Configuration **Important**: Font configuration is global per process and must be set **before the first PDF extraction**. === "Rust" ```rust // CORRECT: Set config before first extraction let config = ExtractionConfig { pdf_options: Some(PdfConfig { font_config: Some(FontConfig { enabled: true, custom_font_dirs: Some(vec![ PathBuf::from("/usr/share/fonts/custom"), ]), }), ..Default::default() }), ..Default::default() }; let result = kreuzberg::extract_file("document.pdf", &config)?; // INCORRECT: Attempting to change config after first extraction let new_config = ExtractionConfig { pdf_options: Some(PdfConfig { font_config: Some(FontConfig { enabled: false, custom_font_dirs: None, }), ..Default::default() }), ..Default::default() }; let result2 = kreuzberg::extract_file("document2.pdf", &new_config)?; // Warning logged: "Font config already initialized" ``` === "Python" ```python # CORRECT: Set config before first extraction config = ExtractionConfig( pdf_options=PdfConfig( font_config=FontConfig( enabled=True, custom_font_dirs=["/usr/share/fonts/custom"] ) ) ) result = extract_file("document.pdf", config) # INCORRECT: Attempting to change config after first extraction new_config = ExtractionConfig( pdf_options=PdfConfig( font_config=FontConfig(enabled=False) ) ) result2 = extract_file("document2.pdf", new_config) # Warning logged: "Font config already initialized" ``` === "TypeScript" ```typescript // CORRECT: Set config before first extraction const config: ExtractionConfig = { pdfOptions: { fontConfig: { enabled: true, customFontDirs: ['/usr/share/fonts/custom'] } } }; const result = await extractFile('document.pdf', config); // INCORRECT: Attempting to change config after first extraction const newConfig: ExtractionConfig = { pdfOptions: { fontConfig: { enabled: false } } }; const result2 = await extractFile('document2.pdf', newConfig); // Warning logged: "Font config already initialized" ``` === "Java" ```java // CORRECT: Set config before first extraction FontConfig fontConfig = FontConfig.builder() .enabled(true) .customFontDirs(Arrays.asList(Paths.get("/usr/share/fonts/custom"))) .build(); PdfConfig pdfConfig = PdfConfig.builder() .fontConfig(fontConfig) .build(); ExtractionConfig config = ExtractionConfig.builder() .pdfOptions(pdfConfig) .build(); ExtractionResult result = Kreuzberg.extractFile("document.pdf", config); // INCORRECT: Attempting to change config after first extraction FontConfig newFontConfig = FontConfig.builder() .enabled(false) .build(); PdfConfig newPdfConfig = PdfConfig.builder() .fontConfig(newFontConfig) .build(); ExtractionConfig newConfig = ExtractionConfig.builder() .pdfOptions(newPdfConfig) .build(); ExtractionResult result2 = Kreuzberg.extractFile("document2.pdf", newConfig); // Warning logged: "Font config already initialized" ``` === "Go" ```go // CORRECT: Set config before first extraction config := &kreuzberg.ExtractionConfig{ PdfOptions: &kreuzberg.PdfConfig{ FontConfig: &kreuzberg.FontConfig{ Enabled: true, CustomFontDirs: []string{"/usr/share/fonts/custom"}, }, }, } result, _ := kreuzberg.ExtractFile("document.pdf", config) // INCORRECT: Attempting to change config after first extraction newConfig := &kreuzberg.ExtractionConfig{ PdfOptions: &kreuzberg.PdfConfig{ FontConfig: &kreuzberg.FontConfig{ Enabled: false, }, }, } result2, _ := kreuzberg.ExtractFile("document2.pdf", newConfig) // Warning logged: "Font config already initialized" ``` === "Ruby" ```ruby # CORRECT: Set config before first extraction config = Kreuzberg::ExtractionConfig.new( pdf_options: Kreuzberg::PdfConfig.new( font_config: Kreuzberg::FontConfig.new( enabled: true, custom_font_dirs: ['/usr/share/fonts/custom'] ) ) ) result = Kreuzberg.extract_file('document.pdf', config) # INCORRECT: Attempting to change config after first extraction new_config = Kreuzberg::ExtractionConfig.new( pdf_options: Kreuzberg::PdfConfig.new( font_config: Kreuzberg::FontConfig.new(enabled: false) ) ) result2 = Kreuzberg.extract_file('document2.pdf', new_config) # Warning logged: "Font config already initialized" ``` === "C#" ```csharp // CORRECT: Set config before first extraction var fontConfig = new FontConfig { Enabled = true, CustomFontDirs = new[] { "/usr/share/fonts/custom" } }; var pdfConfig = new PdfConfig { FontConfig = fontConfig }; var config = new ExtractionConfig { PdfOptions = pdfConfig }; var result = Kreuzberg.ExtractFile("document.pdf", config); // INCORRECT: Attempting to change config after first extraction var newFontConfig = new FontConfig { Enabled = false }; var newPdfConfig = new PdfConfig { FontConfig = newFontConfig }; var newConfig = new ExtractionConfig { PdfOptions = newPdfConfig }; var result2 = Kreuzberg.ExtractFile("document2.pdf", newConfig); // Warning logged: "Font config already initialized" ``` ## Performance Impact With default settings (enabled=true, system fonts): - **PDF extraction**: ~12-13% faster - **Memory**: Minimal increase (~100KB for font cache) - **Startup**: Lazy initialization (no overhead for non-PDF workloads) ## Troubleshooting ### Custom fonts not working **Symptom**: PDF still uses fallback fonts **Solutions**: 1. Verify directories exist and contain .ttf/.otf/.ttc files 2. Check logs for "Custom font directory not found" warnings 3. Ensure paths are absolute or properly expanded 4. Verify font files are readable ### "Font config already initialized" warning **Symptom**: Configuration changes ignored after first PDF extraction **Solution**: Set FontConfig in the **first** ExtractionConfig used. Subsequent config changes are not supported (global limitation). ### Performance regression **Symptom**: PDF extraction slower after upgrade **Solution**: This is unexpected. Please report as a bug with: - PDF sample (if shareable) - Benchmark comparison (before/after) - Configuration used ## Questions? - **Issue tracker**: - **Discussions**: