Files
fil/docs/migration/v4.0-fonts.md
Henrik Jess Nielsen b4c07d3693
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s
Nomad changes
2026-06-01 23:40:55 +02:00

16 KiB

Font Configuration Breaking Change (v4.0)

Summary

Custom font provider is now enabled by default for improved PDF performance.

Breaking Change

Previous behavior (v3.x):

  • Font provider always enabled, not configurable
  • Used system fonts only
  • No user control over font loading

New behavior (v4.0):

  • Font provider enabled by default
  • Configurable via FontConfig in PdfConfig
  • Can disable or add custom font directories
  • ~12-13% faster PDF processing with font caching

Impact

Who is affected?

  • Users who rely on the PDF extractor's default font fallback behavior
  • Users who want to disable the custom font provider
  • Users who need to add custom font directories

What changes?

  • Default: Custom font provider now active (breaking change)
  • Performance: PDF extraction 12-13% faster
  • API: New font_config option in PdfConfig

Migration

For most users, no changes needed. Default behavior provides performance improvements:

=== "Rust"

```rust
use kreuzberg::ExtractionConfig;

// Previous (v4.0) - no font configuration
let config = ExtractionConfig::default();

// Current (v4.0) - same code, now with font provider enabled
let config = ExtractionConfig::default();
// Font provider automatically enabled with system fonts
```

=== "Python"

```python
from kreuzberg import ExtractionConfig

# Previous (v4.0)
config = ExtractionConfig()

# Current (v4.0) - same code, now with font provider enabled
config = ExtractionConfig()
# Font provider automatically enabled with system fonts
```

=== "TypeScript"

```typescript
import { ExtractionConfig } from 'kreuzberg';

// Previous (v4.0)
const config: ExtractionConfig = {};

// Current (v4.0) - same code, now with font provider enabled
const config: ExtractionConfig = {};
// Font provider automatically enabled with system fonts
```

=== "Java"

```java
import dev.kreuzberg.config.*;

// Previous (v4.0)
ExtractionConfig config = ExtractionConfig.builder().build();

// Current (v4.0) - same code, now with font provider enabled
ExtractionConfig config = ExtractionConfig.builder().build();
// Font provider automatically enabled with system fonts
```

=== "Go"

```go
import "github.com/kreuzberg-dev/kreuzberg/v4"

// Previous (v4.0)
config := &kreuzberg.ExtractionConfig{}

// Current (v4.0) - same code, now with font provider enabled
config := &kreuzberg.ExtractionConfig{}
// Font provider automatically enabled with system fonts
```

=== "Ruby"

```ruby
require 'kreuzberg'

# Previous (v4.0)
config = Kreuzberg::ExtractionConfig.new

# Current (v4.0) - same code, now with font provider enabled
config = Kreuzberg::ExtractionConfig.new
# Font provider automatically enabled with system fonts
```

=== "C#"

```csharp
using Kreuzberg;

// Previous (v4.0)
var config = new ExtractionConfig();

// Current (v4.0) - same code, now with font provider enabled
var config = new ExtractionConfig();
// Font provider automatically enabled with system fonts
```

Disable Font Provider

If you prefer the default font handling:

=== "Rust"

```rust
use kreuzberg::{ExtractionConfig, PdfConfig, FontConfig};

let config = ExtractionConfig {
    pdf_options: Some(PdfConfig {
        font_config: Some(FontConfig {
            enabled: false,
            custom_font_dirs: None,
        }),
        ..Default::default()
    }),
    ..Default::default()
};
```

=== "Python"

```python
from kreuzberg import ExtractionConfig, PdfConfig, FontConfig

config = ExtractionConfig(
    pdf_options=PdfConfig(
        font_config=FontConfig(enabled=False)
    )
)
```

=== "TypeScript"

```typescript
import { ExtractionConfig } from 'kreuzberg';

const config: ExtractionConfig = {
  pdfOptions: {
    fontConfig: {
      enabled: false
    }
  }
};
```

=== "Java"

```java
import dev.kreuzberg.config.*;

FontConfig fontConfig = FontConfig.builder()
    .enabled(false)
    .build();

PdfConfig pdfConfig = PdfConfig.builder()
    .fontConfig(fontConfig)
    .build();

ExtractionConfig config = ExtractionConfig.builder()
    .pdfOptions(pdfConfig)
    .build();
```

=== "Go"

```go
import "github.com/kreuzberg-dev/kreuzberg/v4"

config := &kreuzberg.ExtractionConfig{
    PdfOptions: &kreuzberg.PdfConfig{
        FontConfig: &kreuzberg.FontConfig{
            Enabled: false,
        },
    },
}
```

=== "Ruby"

```ruby
require 'kreuzberg'

config = Kreuzberg::ExtractionConfig.new(
  pdf_options: Kreuzberg::PdfConfig.new(
    font_config: Kreuzberg::FontConfig.new(enabled: false)
  )
)
```

=== "C#"

```csharp
using Kreuzberg;

var fontConfig = new FontConfig { Enabled = false };
var pdfConfig = new PdfConfig { FontConfig = fontConfig };
var config = new ExtractionConfig { PdfOptions = pdfConfig };
```

Add Custom Font Directories

To use fonts from custom directories (in addition system fonts):

=== "Rust"

```rust
use kreuzberg::{ExtractionConfig, PdfConfig, FontConfig};
use std::path::PathBuf;

let config = ExtractionConfig {
    pdf_options: Some(PdfConfig {
        font_config: Some(FontConfig {
            enabled: true,
            custom_font_dirs: Some(vec![
                PathBuf::from("/usr/share/fonts/custom"),
                PathBuf::from("~/my-fonts"),  // Tilde expanded automatically
            ]),
        }),
        ..Default::default()
    }),
    ..Default::default()
};
```

=== "Python"

```python
from kreuzberg import ExtractionConfig, PdfConfig, FontConfig

config = ExtractionConfig(
    pdf_options=PdfConfig(
        font_config=FontConfig(
            enabled=True,
            custom_font_dirs=[
                "/usr/share/fonts/custom",
                "~/my-fonts"  # Tilde expanded automatically
            ]
        )
    )
)
```

=== "TypeScript"

```typescript
import { ExtractionConfig } from 'kreuzberg';

const config: ExtractionConfig = {
  pdfOptions: {
    fontConfig: {
      enabled: true,
      customFontDirs: [
        '/usr/share/fonts/custom',
        '~/my-fonts'  // Tilde expanded automatically
      ]
    }
  }
};
```

=== "Java"

```java
import dev.kreuzberg.config.*;
import java.nio.file.Paths;

FontConfig fontConfig = FontConfig.builder()
    .enabled(true)
    .customFontDirs(Arrays.asList(
        Paths.get("/usr/share/fonts/custom"),
        Paths.get("~/my-fonts")  // Tilde expanded automatically
    ))
    .build();

PdfConfig pdfConfig = PdfConfig.builder()
    .fontConfig(fontConfig)
    .build();

ExtractionConfig config = ExtractionConfig.builder()
    .pdfOptions(pdfConfig)
    .build();
```

=== "Go"

```go
import "github.com/kreuzberg-dev/kreuzberg/v4"

config := &kreuzberg.ExtractionConfig{
    PdfOptions: &kreuzberg.PdfConfig{
        FontConfig: &kreuzberg.FontConfig{
            Enabled: true,
            CustomFontDirs: []string{
                "/usr/share/fonts/custom",
                "~/my-fonts",  // Tilde expanded automatically
            },
        },
    },
}
```

=== "Ruby"

```ruby
require 'kreuzberg'

config = Kreuzberg::ExtractionConfig.new(
  pdf_options: Kreuzberg::PdfConfig.new(
    font_config: Kreuzberg::FontConfig.new(
      enabled: true,
      custom_font_dirs: [
        '/usr/share/fonts/custom',
        '~/my-fonts'  # Tilde expanded automatically
      ]
    )
  )
)
```

=== "C#"

```csharp
using Kreuzberg;

var fontConfig = new FontConfig
{
    Enabled = true,
    CustomFontDirs = new[]
    {
        "/usr/share/fonts/custom",
        "~/my-fonts"  // Tilde expanded automatically
    }
};

var pdfConfig = new PdfConfig { FontConfig = fontConfig };
var config = new ExtractionConfig { PdfOptions = pdfConfig };
```

Configuration Files

TOML Format

[pdf_options.font_config]
enabled = true
custom_font_dirs = ["/usr/share/fonts/custom", "~/my-fonts"]

YAML Format

pdf_options:
  font_config:
    enabled: true
    custom_font_dirs:
      - /usr/share/fonts/custom
      - ~/my-fonts

JSON Format

{
  "pdf_options": {
    "font_config": {
      "enabled": true,
      "custom_font_dirs": ["/usr/share/fonts/custom", "~/my-fonts"]
    }
  }
}

Path Handling

The font configuration automatically handles:

  • Tilde expansion: ~/fonts/Users/username/fonts
  • Relative paths: ./fonts/absolute/path/to/fonts
  • Symlinks: Resolved to canonical paths (security measure)
  • Validation: Directories must exist; warnings logged if not found
  • Graceful degradation: Missing directories don't cause failures

Global Configuration

Important: Font configuration is global per process and must be set before the first PDF extraction.

=== "Rust"

```rust
// CORRECT: Set config before first extraction
let config = ExtractionConfig {
    pdf_options: Some(PdfConfig {
        font_config: Some(FontConfig {
            enabled: true,
            custom_font_dirs: Some(vec![
                PathBuf::from("/usr/share/fonts/custom"),
            ]),
        }),
        ..Default::default()
    }),
    ..Default::default()
};

let result = kreuzberg::extract_file("document.pdf", &config)?;

// INCORRECT: Attempting to change config after first extraction
let new_config = ExtractionConfig {
    pdf_options: Some(PdfConfig {
        font_config: Some(FontConfig {
            enabled: false,
            custom_font_dirs: None,
        }),
        ..Default::default()
    }),
    ..Default::default()
};
let result2 = kreuzberg::extract_file("document2.pdf", &new_config)?;
// Warning logged: "Font config already initialized"
```

=== "Python"

```python
# CORRECT: Set config before first extraction
config = ExtractionConfig(
    pdf_options=PdfConfig(
        font_config=FontConfig(
            enabled=True,
            custom_font_dirs=["/usr/share/fonts/custom"]
        )
    )
)
result = extract_file("document.pdf", config)

# INCORRECT: Attempting to change config after first extraction
new_config = ExtractionConfig(
    pdf_options=PdfConfig(
        font_config=FontConfig(enabled=False)
    )
)
result2 = extract_file("document2.pdf", new_config)
# Warning logged: "Font config already initialized"
```

=== "TypeScript"

```typescript
// CORRECT: Set config before first extraction
const config: ExtractionConfig = {
  pdfOptions: {
    fontConfig: {
      enabled: true,
      customFontDirs: ['/usr/share/fonts/custom']
    }
  }
};
const result = await extractFile('document.pdf', config);

// INCORRECT: Attempting to change config after first extraction
const newConfig: ExtractionConfig = {
  pdfOptions: {
    fontConfig: { enabled: false }
  }
};
const result2 = await extractFile('document2.pdf', newConfig);
// Warning logged: "Font config already initialized"
```

=== "Java"

```java
// CORRECT: Set config before first extraction
FontConfig fontConfig = FontConfig.builder()
    .enabled(true)
    .customFontDirs(Arrays.asList(Paths.get("/usr/share/fonts/custom")))
    .build();
PdfConfig pdfConfig = PdfConfig.builder()
    .fontConfig(fontConfig)
    .build();
ExtractionConfig config = ExtractionConfig.builder()
    .pdfOptions(pdfConfig)
    .build();
ExtractionResult result = Kreuzberg.extractFile("document.pdf", config);

// INCORRECT: Attempting to change config after first extraction
FontConfig newFontConfig = FontConfig.builder()
    .enabled(false)
    .build();
PdfConfig newPdfConfig = PdfConfig.builder()
    .fontConfig(newFontConfig)
    .build();
ExtractionConfig newConfig = ExtractionConfig.builder()
    .pdfOptions(newPdfConfig)
    .build();
ExtractionResult result2 = Kreuzberg.extractFile("document2.pdf", newConfig);
// Warning logged: "Font config already initialized"
```

=== "Go"

```go
// CORRECT: Set config before first extraction
config := &kreuzberg.ExtractionConfig{
    PdfOptions: &kreuzberg.PdfConfig{
        FontConfig: &kreuzberg.FontConfig{
            Enabled: true,
            CustomFontDirs: []string{"/usr/share/fonts/custom"},
        },
    },
}
result, _ := kreuzberg.ExtractFile("document.pdf", config)

// INCORRECT: Attempting to change config after first extraction
newConfig := &kreuzberg.ExtractionConfig{
    PdfOptions: &kreuzberg.PdfConfig{
        FontConfig: &kreuzberg.FontConfig{
            Enabled: false,
        },
    },
}
result2, _ := kreuzberg.ExtractFile("document2.pdf", newConfig)
// Warning logged: "Font config already initialized"
```

=== "Ruby"

```ruby
# CORRECT: Set config before first extraction
config = Kreuzberg::ExtractionConfig.new(
  pdf_options: Kreuzberg::PdfConfig.new(
    font_config: Kreuzberg::FontConfig.new(
      enabled: true,
      custom_font_dirs: ['/usr/share/fonts/custom']
    )
  )
)
result = Kreuzberg.extract_file('document.pdf', config)

# INCORRECT: Attempting to change config after first extraction
new_config = Kreuzberg::ExtractionConfig.new(
  pdf_options: Kreuzberg::PdfConfig.new(
    font_config: Kreuzberg::FontConfig.new(enabled: false)
  )
)
result2 = Kreuzberg.extract_file('document2.pdf', new_config)
# Warning logged: "Font config already initialized"
```

=== "C#"

```csharp
// CORRECT: Set config before first extraction
var fontConfig = new FontConfig
{
    Enabled = true,
    CustomFontDirs = new[] { "/usr/share/fonts/custom" }
};
var pdfConfig = new PdfConfig { FontConfig = fontConfig };
var config = new ExtractionConfig { PdfOptions = pdfConfig };
var result = Kreuzberg.ExtractFile("document.pdf", config);

// INCORRECT: Attempting to change config after first extraction
var newFontConfig = new FontConfig { Enabled = false };
var newPdfConfig = new PdfConfig { FontConfig = newFontConfig };
var newConfig = new ExtractionConfig { PdfOptions = newPdfConfig };
var result2 = Kreuzberg.ExtractFile("document2.pdf", newConfig);
// Warning logged: "Font config already initialized"
```

Performance Impact

With default settings (enabled=true, system fonts):

  • PDF extraction: ~12-13% faster
  • Memory: Minimal increase (~100KB for font cache)
  • Startup: Lazy initialization (no overhead for non-PDF workloads)

Troubleshooting

Custom fonts not working

Symptom: PDF still uses fallback fonts

Solutions:

  1. Verify directories exist and contain .ttf/.otf/.ttc files
  2. Check logs for "Custom font directory not found" warnings
  3. Ensure paths are absolute or properly expanded
  4. Verify font files are readable

"Font config already initialized" warning

Symptom: Configuration changes ignored after first PDF extraction

Solution: Set FontConfig in the first ExtractionConfig used. Subsequent config changes are not supported (global limitation).

Performance regression

Symptom: PDF extraction slower after upgrade

Solution: This is unexpected. Please report as a bug with:

  • PDF sample (if shareable)
  • Benchmark comparison (before/after)
  • Configuration used

Questions?