Nomad changes
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s

This commit is contained in:
Henrik Jess Nielsen
2026-06-01 23:40:55 +02:00
parent 72b1a0a6ed
commit b4c07d3693
5723 changed files with 1130655 additions and 0 deletions

View File

@@ -0,0 +1,279 @@
# Plugin API Test Fixtures
This directory contains fixtures for generating E2E tests for plugin/config/utility APIs across all language bindings.
## Purpose
Unlike document extraction fixtures (in parent `fixtures/` directory), these fixtures test:
- Plugin management APIs (validators, post-processors, OCR backends, document extractors)
- Configuration loading APIs (`from_file`, `discover`)
- MIME utility APIs (`detect_mime_type`, `get_extensions_for_mime`, etc.)
## Schema
See `schema.json` for the complete JSON schema definition.
## Fixture Structure
Each fixture is a JSON file defining:
- **id**: Unique identifier (e.g., `validators_list`)
- **api_category**: Category of API (`validator_management`, `configuration`, `mime_utilities`, etc.)
- **api_function**: Function name being tested (snake_case format)
- **test_spec**: Test specification including:
- **pattern**: Test pattern type (see patterns below)
- **setup**: Optional setup steps (temp files, directories, etc.)
- **function_call**: Function to call with arguments
- **assertions**: Expected behavior and values
- **teardown**: Optional cleanup steps
## Test Patterns
### 1. `simple_list`
Lists items from a registry. No setup required.
**Example**: `validators_list.json`
```json
{
"pattern": "simple_list",
"function_call": { "name": "list_validators", "args": [] },
"assertions": { "return_type": "list", "list_item_type": "string" }
}
```
### 2. `clear_registry`
Clears a registry and verifies it's empty.
**Example**: `validators_clear.json`
```json
{
"pattern": "clear_registry",
"function_call": { "name": "clear_validators", "args": [] },
"assertions": { "return_type": "void", "verify_cleanup": true }
}
```
### 3. `graceful_unregister`
Attempts to unregister a nonexistent item without error.
**Example**: `ocr_backends_unregister.json`
```json
{
"pattern": "graceful_unregister",
"function_call": { "name": "unregister_ocr_backend", "args": ["nonexistent-backend-xyz"] },
"assertions": { "does_not_throw": true }
}
```
### 4. `config_from_file`
Creates a temp TOML file, loads config, verifies properties.
**Example**: `config_from_file.json`
```json
{
"pattern": "config_from_file",
"setup": {
"create_temp_file": true,
"temp_file_name": "test_config.toml",
"temp_file_content": "[chunking]\\nmax_chars = 100\\n"
},
"function_call": {
"name": "from_file",
"is_method": true,
"class_name": "ExtractionConfig",
"args": ["${temp_file_path}"]
},
"assertions": {
"object_properties": [{ "path": "chunking.max_chars", "value": 100 }]
}
}
```
### 5. `config_discover`
Creates config in parent dir, changes to subdirectory, discovers config.
**Example**: `config_discover.json`
- Creates `kreuzberg.toml` in temp dir
- Creates subdirectory and changes to it
- Calls `ExtractionConfig.discover()`
- Verifies config was found from parent
### 6. `mime_from_bytes`
Detects MIME type from byte content.
**Example**: `mime_detect_bytes.json`
```json
{
"pattern": "mime_from_bytes",
"setup": { "test_data": "%PDF-1.4\\n" },
"function_call": { "name": "detect_mime_type", "args": ["${test_data_bytes}"] },
"assertions": { "string_contains": "pdf" }
}
```
### 7. `mime_from_path`
Creates temp file, detects MIME from path.
**Example**: `mime_detect_path.json`
### 8. `mime_extension_lookup`
Queries extensions for a MIME type.
**Example**: `mime_get_extensions.json`
## Variable Substitution
Fixtures can use variables in `args`:
- `${temp_file_path}` - Path to created temp file
- `${temp_dir_path}` - Path to created temp directory
- `${test_data_bytes}` - Byte data from `setup.test_data`
## Language-Specific Handling
The generator translates fixtures to language-specific code:
### Function Names
- Fixture: `list_validators` (snake_case)
- Python: `list_validators()`
- TypeScript: `listValidators()`
- Ruby: `list_validators`
- Java: `listValidators()`
- Go: `ListValidators()`
### Class Methods
- Fixture: `ExtractionConfig.from_file`
- Python: `ExtractionConfig.from_file()`
- TypeScript: `ExtractionConfig.fromFile()`
- Ruby: `Config::Extraction.from_file`
- Java: `ExtractionConfig.fromFile()`
- Go: `ConfigFromFile()`
### Temp File Handling
- Python: `tmp_path` fixture (pytest)
- TypeScript: `fs.mkdtempSync()` + `fs.rmSync()`
- Ruby: `Dir.mktmpdir { }` block
- Java: `@TempDir` annotation
- Go: `t.TempDir()`
### Assertions
- Python: `assert` statements
- TypeScript: `expect().toBe()` (Vitest)
- Ruby: `expect().to` (RSpec)
- Java: `assertEquals()` (JUnit)
- Go: `if err != nil` checks
## Special Cases
### Go Lazy Initialization
Document extractors in Go are lazily initialized. The fixture `extractors_list.json` includes:
```json
{
"setup": {
"lazy_init_required": {
"languages": ["go"],
"init_action": "extract_file_sync",
"init_data": {
"create_temp_file": true,
"temp_file_name": "test.pdf",
"temp_file_content": "%PDF-1.4\\n%EOF\\n"
}
}
}
}
```
The generator will produce Go-specific setup code to extract a PDF before listing extractors.
## Fixture Inventory
### Validator Management (2 fixtures)
- `validators_list.json` - List all validators
- `validators_clear.json` - Clear validators
### Post-Processor Management (2 fixtures)
- `post_processors_list.json` - List all post-processors
- `post_processors_clear.json` - Clear post-processors
### OCR Backend Management (3 fixtures)
- `ocr_backends_list.json` - List all OCR backends
- `ocr_backends_unregister.json` - Unregister nonexistent backend
- `ocr_backends_clear.json` - Clear OCR backends
### Document Extractor Management (3 fixtures)
- `extractors_list.json` - List all extractors (with Go lazy init)
- `extractors_unregister.json` - Unregister nonexistent extractor
- `extractors_clear.json` - Clear extractors
### Configuration APIs (2 fixtures)
- `config_from_file.json` - Load config from TOML file
- `config_discover.json` - Discover config from directory tree
### MIME Utilities (3 fixtures)
- `mime_detect_bytes.json` - Detect MIME from bytes
- `mime_detect_path.json` - Detect MIME from file path
- `mime_get_extensions.json` - Get extensions for MIME type
**Total**: 15 fixtures → 75 generated tests (15 per language × 5 languages)
## Regenerating Tests
After modifying fixtures, regenerate tests:
```bash
# Regenerate for all languages
cargo run -p kreuzberg-e2e-generator -- generate --lang python
cargo run -p kreuzberg-e2e-generator -- generate --lang typescript
cargo run -p kreuzberg-e2e-generator -- generate --lang ruby
cargo run -p kreuzberg-e2e-generator -- generate --lang java
cargo run -p kreuzberg-e2e-generator -- generate --lang go
```
Or use the task runner:
```bash
task e2e:generate
```
## Adding New Fixtures
1. Create JSON file following `schema.json`
2. Choose appropriate test pattern
3. Define setup/teardown if needed
4. Specify assertions
5. Regenerate tests
6. Verify tests compile and pass
## Notes
- **DO NOT** write E2E tests by hand
- **ALL** E2E tests must be generated from fixtures
- This is non-negotiable architecture
- Hand-written tests will be rejected by CI

View File

@@ -0,0 +1,17 @@
{
"id": "document_extractors_clear",
"category": "document_extractor_management",
"description": "Clear all document extractors and verify list is empty",
"tags": [
"document_extractor",
"plugin_management",
"clear",
"trait-bridge"
],
"call": "clear_document_extractors",
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,17 @@
{
"id": "embedding_backends_clear",
"category": "embedding_backend_management",
"description": "Clear all embedding backends and verify list is empty",
"tags": [
"embedding",
"plugin_management",
"clear",
"trait-bridge"
],
"call": "clear_embedding_backends",
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,12 @@
{
"id": "embedding_backends_list",
"category": "embedding_backend_management",
"description": "List all registered embedding backends",
"tags": ["embedding", "plugin_management", "list"],
"call": "list_embedding_backends",
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,25 @@
{
"id": "extractors_list",
"category": "document_extractor_management",
"description": "List all registered document extractors",
"tags": ["extractors", "plugin_management", "list"],
"call": "list_document_extractors",
"input": {
"setup": {
"lazy_init_required": {
"languages": ["go"],
"init_action": "extract_file_sync",
"init_data": {
"create_temp_file": true,
"temp_file_name": "test.pdf",
"temp_file_content": "%PDF-1.4\n%EOF\n"
}
}
}
},
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,17 @@
{
"id": "mime_detect_bytes",
"category": "mime_utilities",
"description": "Detect MIME type from file bytes",
"tags": ["mime", "detection", "bytes"],
"call": "detect_mime_type_from_bytes",
"input": {
"data": "pdf/fake_memo.pdf"
},
"assertions": [
{
"type": "contains",
"field": "result",
"value": "pdf"
}
]
}

View File

@@ -0,0 +1,17 @@
{
"id": "mime_detect_image",
"category": "mime_utilities",
"description": "Detect MIME type from PNG image bytes",
"tags": ["mime", "detection", "image", "bytes"],
"call": "detect_mime_type_from_bytes",
"input": {
"data": "images/test_hello_world.png"
},
"assertions": [
{
"type": "contains",
"field": "result",
"value": "png"
}
]
}

View File

@@ -0,0 +1,17 @@
{
"id": "mime_get_extensions",
"category": "mime_utilities",
"description": "Get file extensions for a MIME type",
"tags": ["mime", "extensions", "lookup"],
"call": "get_extensions_for_mime",
"input": {
"mime_type": "application/pdf"
},
"assertions": [
{
"type": "contains",
"field": "result",
"value": "pdf"
}
]
}

View File

@@ -0,0 +1,17 @@
{
"id": "ocr_backends_clear",
"category": "ocr_backend_management",
"description": "Clear all OCR backends and verify list is empty",
"tags": [
"ocr",
"plugin_management",
"clear",
"trait-bridge"
],
"call": "clear_ocr_backends",
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,12 @@
{
"id": "ocr_backends_list",
"category": "ocr_backend_management",
"description": "List all registered OCR backends",
"tags": ["ocr", "plugin_management", "list"],
"call": "list_ocr_backends",
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,20 @@
{
"id": "ocr_backends_unregister",
"category": "ocr_backend_management",
"description": "Unregister nonexistent OCR backend gracefully",
"tags": [
"ocr",
"plugin_management",
"unregister",
"trait-bridge"
],
"call": "unregister_ocr_backend",
"input": {
"name": "nonexistent-backend-xyz"
},
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,17 @@
{
"id": "post_processors_clear",
"category": "post_processor_management",
"description": "Clear all post-processors and verify list is empty",
"tags": [
"post_processors",
"plugin_management",
"clear",
"trait-bridge"
],
"call": "clear_post_processors",
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,12 @@
{
"id": "post_processors_list",
"category": "post_processor_management",
"description": "List all registered post-processors",
"tags": ["post_processors", "plugin_management", "list"],
"call": "list_post_processors",
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,28 @@
{
"id": "register_document_extractor_trait_bridge",
"category": "plugin_api",
"description": "register_document_extractor: trait bridge",
"tags": [
"trait-bridge"
],
"call": "register_document_extractor",
"input": {
"extractor": {
"type": "test",
"name": "test-extractor"
}
},
"args": [
{
"name": "extractor",
"field": "extractor",
"arg_type": "test_backend",
"trait": "DocumentExtractor"
}
],
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,29 @@
{
"id": "register_embedding_backend_trait_bridge",
"category": "plugin_api",
"description": "register_embedding_backend: trait bridge",
"tags": [
"trait-bridge"
],
"call": "register_embedding_backend",
"input": {
"backend": {
"type": "test",
"name": "test-embedding-backend",
"dimensions": 768
}
},
"args": [
{
"name": "backend",
"field": "backend",
"arg_type": "test_backend",
"trait": "EmbeddingBackend"
}
],
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,28 @@
{
"id": "register_ocr_backend_trait_bridge",
"category": "plugin_api",
"description": "register_ocr_backend: trait bridge",
"tags": [
"trait-bridge"
],
"call": "register_ocr_backend",
"input": {
"backend": {
"type": "test",
"name": "test-backend"
}
},
"args": [
{
"name": "backend",
"field": "backend",
"arg_type": "test_backend",
"trait": "OcrBackend"
}
],
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,28 @@
{
"id": "register_post_processor_trait_bridge",
"category": "plugin_api",
"description": "register_post_processor: trait bridge",
"tags": [
"trait-bridge"
],
"call": "register_post_processor",
"input": {
"processor": {
"type": "test",
"name": "test-processor"
}
},
"args": [
{
"name": "processor",
"field": "processor",
"arg_type": "test_backend",
"trait": "PostProcessor"
}
],
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,28 @@
{
"id": "register_renderer_trait_bridge",
"category": "plugin_api",
"description": "register_renderer: trait bridge",
"tags": [
"trait-bridge"
],
"call": "register_renderer",
"input": {
"renderer": {
"type": "test",
"name": "test-renderer"
}
},
"args": [
{
"name": "renderer",
"field": "renderer",
"arg_type": "test_backend",
"trait": "Renderer"
}
],
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,28 @@
{
"id": "register_validator_trait_bridge",
"category": "plugin_api",
"description": "register_validator: trait bridge",
"tags": [
"trait-bridge"
],
"call": "register_validator",
"input": {
"validator": {
"type": "test",
"name": "test-validator"
}
},
"args": [
{
"name": "validator",
"field": "validator",
"arg_type": "test_backend",
"trait": "Validator"
}
],
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,17 @@
{
"id": "renderers_clear",
"category": "renderer_management",
"description": "Clear all renderers and verify list is empty",
"tags": [
"renderer",
"plugin_management",
"clear",
"trait-bridge"
],
"call": "clear_renderers",
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,12 @@
{
"id": "renderers_list",
"category": "renderer_management",
"description": "List all registered renderers",
"tags": ["renderer", "plugin_management", "list"],
"call": "list_renderers",
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,18 @@
{
"id": "unregister_document_extractor_after_register",
"category": "plugin_api",
"description": "unregister_document_extractor",
"call": "unregister_document_extractor",
"input": {
"name": "test-extractor"
},
"assertions": [
{
"type": "not_error"
}
],
"tags": [
"plugin-lifecycle",
"trait-bridge"
]
}

View File

@@ -0,0 +1,18 @@
{
"id": "unregister_embedding_backend_after_register",
"category": "plugin_api",
"description": "unregister_embedding_backend",
"call": "unregister_embedding_backend",
"input": {
"name": "test-embedding-backend"
},
"assertions": [
{
"type": "not_error"
}
],
"tags": [
"plugin-lifecycle",
"trait-bridge"
]
}

View File

@@ -0,0 +1,18 @@
{
"id": "unregister_post_processor_after_register",
"category": "plugin_api",
"description": "unregister_post_processor",
"call": "unregister_post_processor",
"input": {
"name": "test-processor"
},
"assertions": [
{
"type": "not_error"
}
],
"tags": [
"plugin-lifecycle",
"trait-bridge"
]
}

View File

@@ -0,0 +1,18 @@
{
"id": "unregister_renderer_after_register",
"category": "plugin_api",
"description": "unregister_renderer",
"call": "unregister_renderer",
"input": {
"name": "test-renderer"
},
"assertions": [
{
"type": "not_error"
}
],
"tags": [
"plugin-lifecycle",
"trait-bridge"
]
}

View File

@@ -0,0 +1,18 @@
{
"id": "unregister_validator_after_register",
"category": "plugin_api",
"description": "unregister_validator",
"call": "unregister_validator",
"input": {
"name": "test-validator"
},
"assertions": [
{
"type": "not_error"
}
],
"tags": [
"plugin-lifecycle",
"trait-bridge"
]
}

View File

@@ -0,0 +1,17 @@
{
"id": "validators_clear",
"category": "validator_management",
"description": "Clear all validators and verify list is empty",
"tags": [
"validators",
"plugin_management",
"clear",
"trait-bridge"
],
"call": "clear_validators",
"assertions": [
{
"type": "not_error"
}
]
}

View File

@@ -0,0 +1,12 @@
{
"id": "validators_list",
"category": "validator_management",
"description": "List all registered validators",
"tags": ["validators", "plugin_management", "list"],
"call": "list_validators",
"assertions": [
{
"type": "not_error"
}
]
}