# Elixir Binding Systematic Bug Audit **Audit Date**: 2026-05-30 **Repo**: `packages/elixir/` + `e2e/elixir/` **Status**: 28/28 e2e tests green (before audit) ## Executive Summary Found **3 critical bugs** and **2 high-priority gaps** in the Elixir NIF binding: 1. **CRITICAL: CPU-bound NIFs lack DirtyCpu scheduling** — blocks BEAM schedulers 2. **HIGH: Thread panics not safely caught** — crashes BEAM VM 3. **HIGH: Missing Dialyzer config** — type-safety not validated 4. **MISSING: No Dialyzer coverage** 5. **MISSING: No mix_audit in CI** --- ## Findings ### BINDING_BUG #1: Scheduler Violation — CPU-Bound NIFs Without DirtyCpu **Severity**: CRITICAL **Issue**: Operations >1ms run on the normal scheduler, blocking the BEAM. **Lines in NIF**: `packages/elixir/native/kreuzberg_nif/src/lib.rs` #### CPU-Bound but Unscheduled (MUST FIX) 1. **`extract_file_sync` (line 3421)** — calls `kreuzberg::extract_file_sync` - Performs I/O + parsing; easily >10ms - Currently: `#[rustler::nif]` (normal scheduler) - **Fix**: Add `schedule = "DirtyIo"` (I/O-bound) 2. **`extract_bytes_sync` (line 3459)** — calls `kreuzberg::extract_bytes_sync` - Parsing + extraction; easily >10ms - Currently: `#[rustler::nif]` (normal scheduler) - **Fix**: Add `schedule = "DirtyCpu"` (CPU-bound) 3. **`embed_texts` (line 3710)** — embedding inference - Neural network forward pass; 100ms+ - Currently: `#[rustler::nif]` (normal scheduler) - **Fix**: Add `schedule = "DirtyCpu"` (CPU-bound) 4. **`render_pdf_page_to_png` (line 3685)** — PDF rendering - Complex graphics operation; 50-500ms - Currently: `#[rustler::nif]` (normal scheduler) - **Fix**: Add `schedule = "DirtyCpu"` (CPU-bound) #### Already Correct (3 NIFs) These have proper scheduling: - `extract_bytes_async` (line 3302) — `schedule = "DirtyCpu"` ✓ - `extract_file_async` (line 3369) — `schedule = "DirtyCpu"` ✓ - `embed_texts_async` (line 3646) — `schedule = "DirtyCpu"` ✓ #### All Other Quick NIFs (<1ms) These are correctly unscheduled (fast metadata/lookup operations): - `detect_mime_type_from_bytes`, `get_extensions_for_mime` - `list_*_backends`, `list_document_extractors`, `list_renderers`, `list_post_processors`, `list_validators` - `get_embedding_preset`, `list_embedding_presets` - Registry management: `register_*`, `unregister_*`, `clear_*` These are <1ms operations; normal scheduler is fine. --- ### BINDING_BUG #2: Thread Panic Not Safely Handled **Severity**: CRITICAL **Issue**: `.join()` panic is converted to string error, but panics crash the BEAM. **Lines**: - 3331: `extract_bytes_async` — `.map_err(|_| "thread panicked".to_string())?` - 3397: `extract_file_async` — `.map_err(|_| "thread panicked".to_string())?` - 3665: `embed_texts_async` — `.map_err(|_| "thread panicked".to_string())?` **Root Cause**: Rust threads spawned at lines 3313-3331, 3379-3397, 3654-3665 can panic if: - Inside `kreuzberg::extract_bytes()` / `extract_file()` / `embed_texts()` async runtime - Tokio runtime panics or unwind propagates across FFI boundary - `.spawn()` itself panics (thread creation fails) **Current Behavior**: The `.map_err(|_| ...)` silently discards panic details. If panic occurs, `.join()` returns `Err`, converted to generic "thread panicked" string. But if panic unwinds across the FFI boundary BEFORE `.join()`, the BEAM VM crashes. **Fix**: Wrap thread block with `std::panic::catch_unwind()` or ensure Rust code never panics. ```rust let result = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| { let rt = tokio::runtime::Runtime::new()?; rt.block_on(async { kreuzberg::extract_bytes(&content, &mime_type, config).await }) })); // Handle UnwindSafe return ``` --- ### BINDING_BUG #3: Error Tuple Type Inconsistency **Severity**: MEDIUM **Issue**: NIFs return `Result`, but Elixir wrappers expect `{:ok, T} | {:error, atom, String}`. **Evidence**: - All `kreuzberg_nif` functions return `Result` (line 3421-3727) - Elixir `Kreuzberg.Native` module uses `rustler::init!` which auto-converts `Result` to `{:error, Atom, Msg}` - **BUT** spec in `Kreuzberg.ex` line 10 shows: `{:ok, map()} | {:error, atom, String.t()}` **Root Cause**: When Rustler encodes `Err(msg: String)`, it becomes `{:error, "msg"}` (2-tuple), not `{:error, :some_atom, "msg"}` (3-tuple). **Evidence of Issue**: Line 3331, 3397, 3665 return generic "thread panicked" string, but should return proper error atoms. **Fix**: Use custom error type or explicit atom encoding: ```rust #[derive(NifError)] enum NifError { ThreadPanicked, ThreadJoinFailed, ... } ``` --- ### ALEF_GAP: Missing Dialyzer Configuration **Severity**: HIGH **Issue**: No dialyxir/Dialyzer setup in `packages/elixir/mix.exs`. **Current State**: - `mix.exs` (line 31-39) has `credo` but no `:dialyxir` - No `.dialyzer_ignore_warnings` or `.dialyzer.yml` - Elixir specs in `Kreuzberg.ex` and `Kreuzberg.Native` are not validated **Why This Matters**: - Rustler auto-generates Elixir wrappers; type mismatches silently occur - Plugin registration functions (`register_ocr_backend`, etc.) use `pid()` but spec says they return `:ok | :error` — no typecheck - Missing `:dialyxir` means caller errors go undetected **Fix**: 1. Add to `mix.exs` deps: `{:dialyxir, "~> 1.4", only: [:dev, :test], runtime: false}` 2. Add to project config: `dialyzer: [plt_add_apps: [:stdlib, :kernel]]` 3. Run `mix dialyzer` in CI --- ### TEST_FIXTURE: Weak Error Path Testing **Severity**: MEDIUM **Issue**: `e2e/elixir/` tests check happy path but not error handling thoroughly. **Evidence**: - `async_test.exs` line 22-30: Only checks `{:error, _}` — doesn't validate error structure - No tests for thread panics in extraction (would hang or crash) - No tests for invalid config JSON parsing errors **Example**: ```elixir # Current: too loose assert {:error, _} = Kreuzberg.extract_bytes_async(content, "application/x-nonexistent", "{}") # Should be: validate error structure {:error, error_msg} = Kreuzberg.extract_bytes_async(content, "application/x-nonexistent", "{}") assert String.contains?(error_msg, "UnsupportedFormat") or String.contains?(error_msg, "Unsupported") ``` --- ## Commits Needed ### 1. Fix CPU-Bound NIF Scheduling (4 NIFs) **File**: `packages/elixir/native/kreuzberg_nif/src/lib.rs` ```diff -#[rustler::nif] +#[rustler::nif(schedule = "DirtyIo")] pub fn extract_file_sync( path: String, mime_type: Option, config: Option, ) -> Result { -#[rustler::nif] +#[rustler::nif(schedule = "DirtyCpu")] pub fn extract_bytes_sync( content: rustler::Binary, mime_type: String, config: Option, ) -> Result { -#[rustler::nif] +#[rustler::nif(schedule = "DirtyCpu")] pub fn render_pdf_page_to_png( pdf_bytes: rustler::Binary, page_index: usize, dpi: Option, password: Option, ) -> Result, String> { -#[rustler::nif] +#[rustler::nif(schedule = "DirtyCpu")] pub fn embed_texts(texts: Vec, config: Option) -> Result>, String> { ``` ### 2. Fix Thread Panic Handling (3 NIFs) **File**: `packages/elixir/native/kreuzberg_nif/src/lib.rs` Wrap each `std::thread::Builder::new()...spawn()` block with panic-safe error handling. Example for `extract_bytes_async`: ```diff #[rustler::nif(schedule = "DirtyCpu")] pub fn extract_bytes_async( content: rustler::Binary, mime_type: String, config: Option, ) -> Result { let content: Vec = content.as_slice().to_vec(); let config_core: Option = config .map(|s| serde_json::from_str::(&s)) .transpose() .map_err(|e| e.to_string())?; + + let result = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| { std::thread::Builder::new() .stack_size(32 * 1024 * 1024) .spawn(move || { let rt = tokio::runtime::Runtime::new().map_err(|e| e.to_string())?; let result = rt .block_on(async { kreuzberg::extract_bytes( &content, &mime_type, config_core.as_ref().unwrap_or(&Default::default()), ) .await }) .map_err(|e| e.to_string())?; Ok(result.into()) }) .map_err(|e| e.to_string())? .join() .map_err(|_| "thread panicked".to_string())? + })); + + match result { + Ok(inner_result) => inner_result, + Err(_) => Err("thread panicked during extraction".to_string()), + } } ``` ### 3. Add Dialyzer Configuration **File**: `packages/elixir/mix.exs` ```diff defp deps do [ {:jason, "~> 1.4"}, {:rustler, "~> 0.37.0", runtime: false}, {:rustler_precompiled, "~> 0.9"}, {:credo, "~> 1.7", only: [:dev, :test], runtime: false}, + {:dialyxir, "~> 1.4", only: [:dev, :test], runtime: false}, {:ex_doc, "~> 0.40", only: :dev, runtime: false} ] end def project do [ app: :kreuzberg, version: "5.0.0-rc.3", elixir: "~> 1.14", elixirc_paths: ["lib", Path.expand("../../packages/elixir/native/kreuzberg_nif/src", __DIR__)], rustler_crates: [ kreuzberg_nif: [ mode: :release, targets: ~w(aarch64-apple-darwin aarch64-unknown-linux-gnu x86_64-unknown-linux-gnu x86_64-pc-windows-gnu) ] ], description: "High-performance document intelligence library", + dialyzer: [ + plt_add_apps: [:stdlib, :kernel, :rustler] + ], package: package(), deps: deps() ] end ``` ### 4. Update Native.ex Error Type Specs (Optional Breaking Change for v5) Since v5 RC cycle allows breaking changes, fix the error tuple spec: **File**: `packages/elixir/lib/kreuzberg/native.ex` Ensure all `def` stubs match the 3-tuple error format returned by Rustler. --- ## Test Status **Current**: 28/28 e2e tests pass **After fixes**: Should remain 28/28 pass The fixes are internal safety improvements and scheduling; they don't change the public API contract. Tests continue to pass but the NIF implementation becomes: - Non-blocking for BEAM scheduler - Safe against panics - Type-checked with Dialyzer --- ## Verification Steps 1. **Run e2e before fix**: ```bash task elixir:e2e ``` Expected: 28/28 pass 2. **Apply fixes to NIF** 3. **Rebuild and test**: ```bash cd packages/elixir KREUZBERG_BUILD=1 mix deps.get KREUZBERG_BUILD=1 mix compile cd ../../e2e/elixir KREUZBERG_BUILD=1 mix deps.get mix test ``` Expected: 28/28 pass 4. **Add Dialyzer**: ```bash cd packages/elixir mix dialyzer ``` Expected: No errors (type-safe) --- ## Root Causes | Bug | Root | Why It Happened | |-----|------|-----------------| | CPU-bound without DirtyCpu | No scheduler review before alef regeneration | Generated code assumed all NIFs are quick; extraction/embedding ops not CPU-profiled | | Thread panic unsafely | Incomplete error wrapping in template | `.join()` error was caught, but panic unwind before join not guarded | | No Dialyzer | CI doesn't require type checking | Project focuses on unit/e2e tests; static analysis gap | --- ## References - Rustler Docs: - BEAM Scheduler: (see `schedule` param) - Elixir NIF best practices: