Files
Henrik Jess Nielsen b4c07d3693
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s
Nomad changes
2026-06-01 23:40:55 +02:00

3.0 KiB

name, description, priority
name description priority
plugin-architecture-patterns Plugin architecture, registration, and trait patterns critical

Plugin Architecture & Registration

Plugin Types

Type Trait Location
Document Extractor DocumentExtractor: Plugin plugins/extractor/trait.rs
OCR Backend OcrBackend: Plugin plugins/ocr/trait.rs
Post Processor PostProcessor: Plugin plugins/processor/trait.rs
Validator Validator: Plugin plugins/validator/trait.rs

DocumentExtractor Implementation

use crate::plugins::{DocumentExtractor, Plugin};
use async_trait::async_trait;

pub struct MyExtractor;

impl Plugin for MyExtractor {
    fn name(&self) -> &str { "my-extractor" }
    fn version(&self) -> String { env!("CARGO_PKG_VERSION").to_string() }
}

#[async_trait]
impl DocumentExtractor for MyExtractor {
    async fn extract_bytes(&self, content: &[u8], mime_type: &str, config: &ExtractionConfig)
        -> Result<ExtractionResult> { /* ... */ }

    fn supported_mime_types(&self) -> &[&str] { &["application/x-custom"] }
    fn priority(&self) -> i32 { 50 }

    // WASM support (optional)
    fn as_sync_extractor(&self) -> Option<&dyn SyncExtractor> { None }
}

Priority System

Range Use
0-25 Fallback/low-quality
26-49 Alternative extractors
50 Default (built-in)
51-75 Premium/enhanced
76-100 Specialized/high-priority

Registry selects highest priority extractor for each MIME type. Override built-ins with priority > 50.

Registration

// In extractors/mod.rs → register_default_extractors()
let registry = get_document_extractor_registry();
let mut registry = registry.write()
    .map_err(|e| KreuzbergError::Other(format!("Registry lock poisoned: {}", e)))?;
registry.register(Arc::new(MyExtractor::new()))?;

Feature-Gated Registration

#[cfg(feature = "office")]
{
    registry.register(Arc::new(DocxExtractor::new()))?;
    registry.register(Arc::new(PptxExtractor::new()))?;
}

PostProcessor Pattern

impl PostProcessor for MyProcessor {
    async fn process(&self, result: &mut ExtractionResult, config: &ExtractionConfig)
        -> Result<()> {
        result.content = process_content(&result.content);
        Ok(())
    }
    fn stage(&self) -> ProcessorStage { ProcessorStage::Middle }
}

Stages: EarlyMiddleLate. Failures isolated (don't block others).

Critical Rules

  1. All plugins MUST be Send + Sync
  2. Feature gate with #[cfg(feature = "...")] for optional formats
  3. Use #[async_trait] for DocumentExtractor
  4. Initialization via ensure_initialized() (lazy, called before first extraction)
  5. Plugin names: kebab-case (e.g., "pdf-extractor")