Files
fil/docs/snippets/php/plugins/README.md

232 lines
6.2 KiB
Markdown
Raw Normal View History

2026-06-01 23:40:55 +02:00
# PHP Plugin System - Deferred to Future Version
## Status: Not Yet Implemented
The PHP plugin system for Kreuzberg is **deferred to a future version**. This includes:
- Custom OCR backend registration
- Post-processor plugins
- Validator plugins
- Custom extractor plugins
## Why Deferred?
The plugin system requires complex callback handling between Rust and PHP through ext-php-rs. Specifically:
1. **Callback Challenges**: ext-php-rs callback support for complex interfaces is still evolving
2. **Memory Safety**: Ensuring proper lifetime management for PHP closures called from Rust
3. **Error Handling**: Propagating exceptions across the FFI boundary in plugin contexts
4. **Performance**: Minimizing overhead of cross-language callbacks in hot paths
## Affected Functions (~16 functions)
The following functions exist in Python, Ruby, Node.js, and other bindings but are not yet available in PHP:
### OCR Backend Registration
- `kreuzberg_register_ocr_backend()`
- `kreuzberg_unregister_ocr_backend()`
- `kreuzberg_list_ocr_backends()`
### Post-Processor Plugins
- `kreuzberg_register_post_processor()`
- `kreuzberg_unregister_post_processor()`
- `kreuzberg_list_post_processors()`
- `kreuzberg_clear_post_processors()`
### Validator Plugins
- `kreuzberg_register_validator()`
- `kreuzberg_unregister_validator()`
- `kreuzberg_list_validators()`
- `kreuzberg_clear_validators()`
### Custom Extractor Plugins
- `kreuzberg_register_extractor()`
- `kreuzberg_unregister_extractor()`
- `kreuzberg_list_extractors()`
- `kreuzberg_clear_extractors()`
### Plugin Testing
- `kreuzberg_test_plugin()`
## Workarounds
Until the plugin system is implemented, you can:
### 1. Post-Process Results in PHP
Instead of registering a post-processor plugin, process the extraction result directly:
```php title="Post-Process Results"
<?php
declare(strict_types=1);
use Kreuzberg\Kreuzberg;
use Kreuzberg\Types\ExtractionResult;
function postProcessResult(ExtractionResult $result): ExtractionResult
{
// Custom post-processing logic
$processedContent = strtoupper($result->content);
// Return a new result with modified content
return new ExtractionResult(
content: $processedContent,
mimeType: $result->mimeType,
metadata: $result->metadata,
tables: $result->tables,
images: $result->images,
chunks: $result->chunks,
);
}
$kreuzberg = new Kreuzberg();
$result = $kreuzberg->extractFile('document.pdf');
$processed = postProcessResult($result);
```
### 2. Use Built-in OCR Backends
PHP bindings support all built-in OCR backends:
```php title="Built-in OCR Backends"
<?php
declare(strict_types=1);
use Kreuzberg\Config\ExtractionConfig;
use Kreuzberg\Config\OcrConfig;
use Kreuzberg\Kreuzberg;
$config = new ExtractionConfig(
ocr: new OcrConfig(
backend: 'tesseract', // Built-in: tesseract, apple-vision (macOS)
language: 'eng',
),
);
$kreuzberg = new Kreuzberg($config);
$result = $kreuzberg->extractFile('scanned.pdf');
```
### 3. Validate Results in PHP
Instead of validator plugins, validate extraction results directly:
```php title="Validate Results"
<?php
declare(strict_types=1);
use Kreuzberg\Exceptions\ValidationException;
use Kreuzberg\Types\ExtractionResult;
function validateResult(ExtractionResult $result): void
{
if (strlen($result->content) < 100) {
throw new ValidationException('Content too short (minimum 100 characters)');
}
if ($result->metadata?->pageCount === 0) {
throw new ValidationException('Document has no pages');
}
}
$result = $kreuzberg->extractFile('document.pdf');
validateResult($result);
```
### 4. Extend the Kreuzberg Class
For application-specific functionality, extend the main class:
```php title="Extend Kreuzberg Class"
<?php
declare(strict_types=1);
use Kreuzberg\Config\ExtractionConfig;
use Kreuzberg\Kreuzberg as BaseKreuzberg;
use Kreuzberg\Types\ExtractionResult;
final class CustomKreuzberg extends BaseKreuzberg
{
public function extractAndValidate(
string $path,
?ExtractionConfig $config = null
): ExtractionResult {
$result = $this->extractFile($path, $config);
// Custom validation
if (strlen($result->content) < 100) {
throw new \RuntimeException('Content too short');
}
return $result;
}
public function extractAndTransform(
string $path,
callable $transformer,
?ExtractionConfig $config = null
): ExtractionResult {
$result = $this->extractFile($path, $config);
// Custom transformation
$transformedContent = $transformer($result->content);
return new ExtractionResult(
content: $transformedContent,
mimeType: $result->mimeType,
metadata: $result->metadata,
tables: $result->tables,
images: $result->images,
chunks: $result->chunks,
);
}
}
```
## Timeline
The plugin system is planned for a future PHP bindings release (tentatively v4.1.0 or v4.2.0), pending:
1. Ext-php-rs improvements for complex callbacks
2. Comprehensive testing of callback performance and safety
3. Documentation of plugin interfaces
## Current Feature Parity
Despite the deferred plugin system, PHP bindings achieve **95% feature parity** with other language bindings:
- ✅ All extraction functions (file, bytes, batch)
- ✅ All configuration options (OCR, PDF, chunking, embeddings)
- ✅ All result types (tables, images, chunks, metadata)
- ✅ All validation functions (14 validators)
- ✅ Embedding presets (2 functions + class)
- ✅ Error classification (3 functions + class)
- ✅ Config helpers (JSON export, field access, merging)
- ❌ Plugin system (16 functions) - **deferred**
## Questions?
For questions about the plugin system or to request early access when available:
- GitHub Issues: <https://github.com/kreuzberg-dev/kreuzberg/issues>
- Discussions: <https://github.com/kreuzberg-dev/kreuzberg/discussions>
## Contributing
If you're interested in helping implement the plugin system for PHP:
1. Review the plugin implementations in Python (`crates/kreuzberg-py/src/plugins.rs`)
2. Review ext-php-rs callback documentation
3. Open a discussion on the Kreuzberg GitHub repository
We welcome contributions!