Files

280 lines
7.0 KiB
Markdown
Raw Permalink Normal View History

2026-06-01 23:40:55 +02:00
# Plugin API Test Fixtures
This directory contains fixtures for generating E2E tests for plugin/config/utility APIs across all language bindings.
## Purpose
Unlike document extraction fixtures (in parent `fixtures/` directory), these fixtures test:
- Plugin management APIs (validators, post-processors, OCR backends, document extractors)
- Configuration loading APIs (`from_file`, `discover`)
- MIME utility APIs (`detect_mime_type`, `get_extensions_for_mime`, etc.)
## Schema
See `schema.json` for the complete JSON schema definition.
## Fixture Structure
Each fixture is a JSON file defining:
- **id**: Unique identifier (e.g., `validators_list`)
- **api_category**: Category of API (`validator_management`, `configuration`, `mime_utilities`, etc.)
- **api_function**: Function name being tested (snake_case format)
- **test_spec**: Test specification including:
- **pattern**: Test pattern type (see patterns below)
- **setup**: Optional setup steps (temp files, directories, etc.)
- **function_call**: Function to call with arguments
- **assertions**: Expected behavior and values
- **teardown**: Optional cleanup steps
## Test Patterns
### 1. `simple_list`
Lists items from a registry. No setup required.
**Example**: `validators_list.json`
```json
{
"pattern": "simple_list",
"function_call": { "name": "list_validators", "args": [] },
"assertions": { "return_type": "list", "list_item_type": "string" }
}
```
### 2. `clear_registry`
Clears a registry and verifies it's empty.
**Example**: `validators_clear.json`
```json
{
"pattern": "clear_registry",
"function_call": { "name": "clear_validators", "args": [] },
"assertions": { "return_type": "void", "verify_cleanup": true }
}
```
### 3. `graceful_unregister`
Attempts to unregister a nonexistent item without error.
**Example**: `ocr_backends_unregister.json`
```json
{
"pattern": "graceful_unregister",
"function_call": { "name": "unregister_ocr_backend", "args": ["nonexistent-backend-xyz"] },
"assertions": { "does_not_throw": true }
}
```
### 4. `config_from_file`
Creates a temp TOML file, loads config, verifies properties.
**Example**: `config_from_file.json`
```json
{
"pattern": "config_from_file",
"setup": {
"create_temp_file": true,
"temp_file_name": "test_config.toml",
"temp_file_content": "[chunking]\\nmax_chars = 100\\n"
},
"function_call": {
"name": "from_file",
"is_method": true,
"class_name": "ExtractionConfig",
"args": ["${temp_file_path}"]
},
"assertions": {
"object_properties": [{ "path": "chunking.max_chars", "value": 100 }]
}
}
```
### 5. `config_discover`
Creates config in parent dir, changes to subdirectory, discovers config.
**Example**: `config_discover.json`
- Creates `kreuzberg.toml` in temp dir
- Creates subdirectory and changes to it
- Calls `ExtractionConfig.discover()`
- Verifies config was found from parent
### 6. `mime_from_bytes`
Detects MIME type from byte content.
**Example**: `mime_detect_bytes.json`
```json
{
"pattern": "mime_from_bytes",
"setup": { "test_data": "%PDF-1.4\\n" },
"function_call": { "name": "detect_mime_type", "args": ["${test_data_bytes}"] },
"assertions": { "string_contains": "pdf" }
}
```
### 7. `mime_from_path`
Creates temp file, detects MIME from path.
**Example**: `mime_detect_path.json`
### 8. `mime_extension_lookup`
Queries extensions for a MIME type.
**Example**: `mime_get_extensions.json`
## Variable Substitution
Fixtures can use variables in `args`:
- `${temp_file_path}` - Path to created temp file
- `${temp_dir_path}` - Path to created temp directory
- `${test_data_bytes}` - Byte data from `setup.test_data`
## Language-Specific Handling
The generator translates fixtures to language-specific code:
### Function Names
- Fixture: `list_validators` (snake_case)
- Python: `list_validators()`
- TypeScript: `listValidators()`
- Ruby: `list_validators`
- Java: `listValidators()`
- Go: `ListValidators()`
### Class Methods
- Fixture: `ExtractionConfig.from_file`
- Python: `ExtractionConfig.from_file()`
- TypeScript: `ExtractionConfig.fromFile()`
- Ruby: `Config::Extraction.from_file`
- Java: `ExtractionConfig.fromFile()`
- Go: `ConfigFromFile()`
### Temp File Handling
- Python: `tmp_path` fixture (pytest)
- TypeScript: `fs.mkdtempSync()` + `fs.rmSync()`
- Ruby: `Dir.mktmpdir { }` block
- Java: `@TempDir` annotation
- Go: `t.TempDir()`
### Assertions
- Python: `assert` statements
- TypeScript: `expect().toBe()` (Vitest)
- Ruby: `expect().to` (RSpec)
- Java: `assertEquals()` (JUnit)
- Go: `if err != nil` checks
## Special Cases
### Go Lazy Initialization
Document extractors in Go are lazily initialized. The fixture `extractors_list.json` includes:
```json
{
"setup": {
"lazy_init_required": {
"languages": ["go"],
"init_action": "extract_file_sync",
"init_data": {
"create_temp_file": true,
"temp_file_name": "test.pdf",
"temp_file_content": "%PDF-1.4\\n%EOF\\n"
}
}
}
}
```
The generator will produce Go-specific setup code to extract a PDF before listing extractors.
## Fixture Inventory
### Validator Management (2 fixtures)
- `validators_list.json` - List all validators
- `validators_clear.json` - Clear validators
### Post-Processor Management (2 fixtures)
- `post_processors_list.json` - List all post-processors
- `post_processors_clear.json` - Clear post-processors
### OCR Backend Management (3 fixtures)
- `ocr_backends_list.json` - List all OCR backends
- `ocr_backends_unregister.json` - Unregister nonexistent backend
- `ocr_backends_clear.json` - Clear OCR backends
### Document Extractor Management (3 fixtures)
- `extractors_list.json` - List all extractors (with Go lazy init)
- `extractors_unregister.json` - Unregister nonexistent extractor
- `extractors_clear.json` - Clear extractors
### Configuration APIs (2 fixtures)
- `config_from_file.json` - Load config from TOML file
- `config_discover.json` - Discover config from directory tree
### MIME Utilities (3 fixtures)
- `mime_detect_bytes.json` - Detect MIME from bytes
- `mime_detect_path.json` - Detect MIME from file path
- `mime_get_extensions.json` - Get extensions for MIME type
**Total**: 15 fixtures → 75 generated tests (15 per language × 5 languages)
## Regenerating Tests
After modifying fixtures, regenerate tests:
```bash
# Regenerate for all languages
cargo run -p kreuzberg-e2e-generator -- generate --lang python
cargo run -p kreuzberg-e2e-generator -- generate --lang typescript
cargo run -p kreuzberg-e2e-generator -- generate --lang ruby
cargo run -p kreuzberg-e2e-generator -- generate --lang java
cargo run -p kreuzberg-e2e-generator -- generate --lang go
```
Or use the task runner:
```bash
task e2e:generate
```
## Adding New Fixtures
1. Create JSON file following `schema.json`
2. Choose appropriate test pattern
3. Define setup/teardown if needed
4. Specify assertions
5. Regenerate tests
6. Verify tests compile and pass
## Notes
- **DO NOT** write E2E tests by hand
- **ALL** E2E tests must be generated from fixtures
- This is non-negotiable architecture
- Hand-written tests will be rejected by CI