12 KiB
Elixir Binding Systematic Bug Audit
Audit Date: 2026-05-30
Repo: packages/elixir/ + e2e/elixir/
Status: 28/28 e2e tests green (before audit)
Executive Summary
Found 3 critical bugs and 2 high-priority gaps in the Elixir NIF binding:
- CRITICAL: CPU-bound NIFs lack DirtyCpu scheduling — blocks BEAM schedulers
- HIGH: Thread panics not safely caught — crashes BEAM VM
- HIGH: Missing Dialyzer config — type-safety not validated
- MISSING: No Dialyzer coverage
- MISSING: No mix_audit in CI
Findings
BINDING_BUG #1: Scheduler Violation — CPU-Bound NIFs Without DirtyCpu
Severity: CRITICAL
Issue: Operations >1ms run on the normal scheduler, blocking the BEAM.
Lines in NIF: packages/elixir/native/kreuzberg_nif/src/lib.rs
CPU-Bound but Unscheduled (MUST FIX)
-
extract_file_sync(line 3421) — callskreuzberg::extract_file_sync- Performs I/O + parsing; easily >10ms
- Currently:
#[rustler::nif](normal scheduler) - Fix: Add
schedule = "DirtyIo"(I/O-bound)
-
extract_bytes_sync(line 3459) — callskreuzberg::extract_bytes_sync- Parsing + extraction; easily >10ms
- Currently:
#[rustler::nif](normal scheduler) - Fix: Add
schedule = "DirtyCpu"(CPU-bound)
-
embed_texts(line 3710) — embedding inference- Neural network forward pass; 100ms+
- Currently:
#[rustler::nif](normal scheduler) - Fix: Add
schedule = "DirtyCpu"(CPU-bound)
-
render_pdf_page_to_png(line 3685) — PDF rendering- Complex graphics operation; 50-500ms
- Currently:
#[rustler::nif](normal scheduler) - Fix: Add
schedule = "DirtyCpu"(CPU-bound)
Already Correct (3 NIFs)
These have proper scheduling:
extract_bytes_async(line 3302) —schedule = "DirtyCpu"✓extract_file_async(line 3369) —schedule = "DirtyCpu"✓embed_texts_async(line 3646) —schedule = "DirtyCpu"✓
All Other Quick NIFs (<1ms)
These are correctly unscheduled (fast metadata/lookup operations):
detect_mime_type_from_bytes,get_extensions_for_mimelist_*_backends,list_document_extractors,list_renderers,list_post_processors,list_validatorsget_embedding_preset,list_embedding_presets- Registry management:
register_*,unregister_*,clear_*
These are <1ms operations; normal scheduler is fine.
BINDING_BUG #2: Thread Panic Not Safely Handled
Severity: CRITICAL
Issue: .join() panic is converted to string error, but panics crash the BEAM.
Lines:
- 3331:
extract_bytes_async—.map_err(|_| "thread panicked".to_string())? - 3397:
extract_file_async—.map_err(|_| "thread panicked".to_string())? - 3665:
embed_texts_async—.map_err(|_| "thread panicked".to_string())?
Root Cause: Rust threads spawned at lines 3313-3331, 3379-3397, 3654-3665 can panic if:
- Inside
kreuzberg::extract_bytes()/extract_file()/embed_texts()async runtime - Tokio runtime panics or unwind propagates across FFI boundary
.spawn()itself panics (thread creation fails)
Current Behavior: The .map_err(|_| ...) silently discards panic details. If panic occurs, .join() returns Err, converted to generic "thread panicked" string. But if panic unwinds across the FFI boundary BEFORE .join(), the BEAM VM crashes.
Fix: Wrap thread block with std::panic::catch_unwind() or ensure Rust code never panics.
let result = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
let rt = tokio::runtime::Runtime::new()?;
rt.block_on(async {
kreuzberg::extract_bytes(&content, &mime_type, config).await
})
}));
// Handle UnwindSafe return
BINDING_BUG #3: Error Tuple Type Inconsistency
Severity: MEDIUM
Issue: NIFs return Result<T, String>, but Elixir wrappers expect {:ok, T} | {:error, atom, String}.
Evidence:
- All
kreuzberg_niffunctions returnResult<T, String>(line 3421-3727) - Elixir
Kreuzberg.Nativemodule usesrustler::init!which auto-convertsResult<T, String>to{:error, Atom, Msg} - BUT spec in
Kreuzberg.exline 10 shows:{:ok, map()} | {:error, atom, String.t()}
Root Cause: When Rustler encodes Err(msg: String), it becomes {:error, "msg"} (2-tuple), not {:error, :some_atom, "msg"} (3-tuple).
Evidence of Issue: Line 3331, 3397, 3665 return generic "thread panicked" string, but should return proper error atoms.
Fix: Use custom error type or explicit atom encoding:
#[derive(NifError)]
enum NifError {
ThreadPanicked,
ThreadJoinFailed,
...
}
ALEF_GAP: Missing Dialyzer Configuration
Severity: HIGH
Issue: No dialyxir/Dialyzer setup in packages/elixir/mix.exs.
Current State:
mix.exs(line 31-39) hascredobut no:dialyxir- No
.dialyzer_ignore_warningsor.dialyzer.yml - Elixir specs in
Kreuzberg.exandKreuzberg.Nativeare not validated
Why This Matters:
- Rustler auto-generates Elixir wrappers; type mismatches silently occur
- Plugin registration functions (
register_ocr_backend, etc.) usepid()but spec says they return:ok | :error— no typecheck - Missing
:dialyxirmeans caller errors go undetected
Fix:
- Add to
mix.exsdeps:{:dialyxir, "~> 1.4", only: [:dev, :test], runtime: false} - Add to project config:
dialyzer: [plt_add_apps: [:stdlib, :kernel]] - Run
mix dialyzerin CI
TEST_FIXTURE: Weak Error Path Testing
Severity: MEDIUM
Issue: e2e/elixir/ tests check happy path but not error handling thoroughly.
Evidence:
async_test.exsline 22-30: Only checks{:error, _}— doesn't validate error structure- No tests for thread panics in extraction (would hang or crash)
- No tests for invalid config JSON parsing errors
Example:
# Current: too loose
assert {:error, _} = Kreuzberg.extract_bytes_async(content, "application/x-nonexistent", "{}")
# Should be: validate error structure
{:error, error_msg} = Kreuzberg.extract_bytes_async(content, "application/x-nonexistent", "{}")
assert String.contains?(error_msg, "UnsupportedFormat") or String.contains?(error_msg, "Unsupported")
Commits Needed
1. Fix CPU-Bound NIF Scheduling (4 NIFs)
File: packages/elixir/native/kreuzberg_nif/src/lib.rs
-#[rustler::nif]
+#[rustler::nif(schedule = "DirtyIo")]
pub fn extract_file_sync(
path: String,
mime_type: Option<String>,
config: Option<String>,
) -> Result<ExtractionResult, String> {
-#[rustler::nif]
+#[rustler::nif(schedule = "DirtyCpu")]
pub fn extract_bytes_sync(
content: rustler::Binary,
mime_type: String,
config: Option<String>,
) -> Result<ExtractionResult, String> {
-#[rustler::nif]
+#[rustler::nif(schedule = "DirtyCpu")]
pub fn render_pdf_page_to_png(
pdf_bytes: rustler::Binary,
page_index: usize,
dpi: Option<i32>,
password: Option<String>,
) -> Result<Vec<u8>, String> {
-#[rustler::nif]
+#[rustler::nif(schedule = "DirtyCpu")]
pub fn embed_texts(texts: Vec<String>, config: Option<String>) -> Result<Vec<Vec<f32>>, String> {
2. Fix Thread Panic Handling (3 NIFs)
File: packages/elixir/native/kreuzberg_nif/src/lib.rs
Wrap each std::thread::Builder::new()...spawn() block with panic-safe error handling. Example for extract_bytes_async:
#[rustler::nif(schedule = "DirtyCpu")]
pub fn extract_bytes_async(
content: rustler::Binary,
mime_type: String,
config: Option<String>,
) -> Result<ExtractionResult, String> {
let content: Vec<u8> = content.as_slice().to_vec();
let config_core: Option<kreuzberg::ExtractionConfig> = config
.map(|s| serde_json::from_str::<kreuzberg::ExtractionConfig>(&s))
.transpose()
.map_err(|e| e.to_string())?;
+
+ let result = std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
std::thread::Builder::new()
.stack_size(32 * 1024 * 1024)
.spawn(move || {
let rt = tokio::runtime::Runtime::new().map_err(|e| e.to_string())?;
let result = rt
.block_on(async {
kreuzberg::extract_bytes(
&content,
&mime_type,
config_core.as_ref().unwrap_or(&Default::default()),
)
.await
})
.map_err(|e| e.to_string())?;
Ok(result.into())
})
.map_err(|e| e.to_string())?
.join()
.map_err(|_| "thread panicked".to_string())?
+ }));
+
+ match result {
+ Ok(inner_result) => inner_result,
+ Err(_) => Err("thread panicked during extraction".to_string()),
+ }
}
3. Add Dialyzer Configuration
File: packages/elixir/mix.exs
defp deps do
[
{:jason, "~> 1.4"},
{:rustler, "~> 0.37.0", runtime: false},
{:rustler_precompiled, "~> 0.9"},
{:credo, "~> 1.7", only: [:dev, :test], runtime: false},
+ {:dialyxir, "~> 1.4", only: [:dev, :test], runtime: false},
{:ex_doc, "~> 0.40", only: :dev, runtime: false}
]
end
def project do
[
app: :kreuzberg,
version: "5.0.0-rc.3",
elixir: "~> 1.14",
elixirc_paths: ["lib", Path.expand("../../packages/elixir/native/kreuzberg_nif/src", __DIR__)],
rustler_crates: [
kreuzberg_nif: [
mode: :release,
targets: ~w(aarch64-apple-darwin aarch64-unknown-linux-gnu x86_64-unknown-linux-gnu x86_64-pc-windows-gnu)
]
],
description: "High-performance document intelligence library",
+ dialyzer: [
+ plt_add_apps: [:stdlib, :kernel, :rustler]
+ ],
package: package(),
deps: deps()
]
end
4. Update Native.ex Error Type Specs (Optional Breaking Change for v5)
Since v5 RC cycle allows breaking changes, fix the error tuple spec:
File: packages/elixir/lib/kreuzberg/native.ex
Ensure all def stubs match the 3-tuple error format returned by Rustler.
Test Status
Current: 28/28 e2e tests pass After fixes: Should remain 28/28 pass
The fixes are internal safety improvements and scheduling; they don't change the public API contract. Tests continue to pass but the NIF implementation becomes:
- Non-blocking for BEAM scheduler
- Safe against panics
- Type-checked with Dialyzer
Verification Steps
-
Run e2e before fix:
task elixir:e2eExpected: 28/28 pass
-
Apply fixes to NIF
-
Rebuild and test:
cd packages/elixir KREUZBERG_BUILD=1 mix deps.get KREUZBERG_BUILD=1 mix compile cd ../../e2e/elixir KREUZBERG_BUILD=1 mix deps.get mix testExpected: 28/28 pass
-
Add Dialyzer:
cd packages/elixir mix dialyzerExpected: No errors (type-safe)
Root Causes
| Bug | Root | Why It Happened |
|---|---|---|
| CPU-bound without DirtyCpu | No scheduler review before alef regeneration | Generated code assumed all NIFs are quick; extraction/embedding ops not CPU-profiled |
| Thread panic unsafely | Incomplete error wrapping in template | .join() error was caught, but panic unwind before join not guarded |
| No Dialyzer | CI doesn't require type checking | Project focuses on unit/e2e tests; static analysis gap |
References
- Rustler Docs: https://github.com/rusterlium/rustler
- BEAM Scheduler: https://www.erlang.org/doc/man/erl_nif.html (see
scheduleparam) - Elixir NIF best practices: https://hexdocs.pm/rustler/