6.2 KiB
Ruby Binding Audit (May 30, 2026)
Executive Summary
Ruby e2e tests pass: 97/97 (100%)
Found 1 critical bug affecting GVL (Global VM Lock) management that silently degrades multi-threaded Ruby applications. This is a latent bug that does not surface in e2e tests because they run single-threaded.
Bug #1: GVL Not Released During Async Extraction (CRITICAL)
Severity: Critical (silent multi-threading bug)
Affected Functions:
extract_bytes_asyncextract_file_asyncbatch_extract_filesbatch_extract_files_asyncbatch_extract_bytesbatch_extract_bytes_async
Location: packages/ruby/ext/kreuzberg_rb/src/lib.rs lines 16781-17173
Problem:
These functions call tokio::runtime::Runtime::new() and .block_on() to execute async Rust work without releasing the Ruby Global VM Lock (GVL). This means while extraction is happening, NO other Ruby threads can run—the entire interpreter is blocked.
Example from extract_bytes_async:
fn extract_bytes_async(args: &[magnus::Value]) -> Result<ExtractionResult, Error> {
// ... arg parsing ...
let rt = tokio::runtime::Runtime::new().map_err(|e| { ... })?;
let result = rt
.block_on(async { kreuzberg::extract_bytes(&content, &mime_type, &config_core).await })
.map_err(|e| { ... })?;
Ok(result.into())
}
Why Tests Pass:
The e2e suite runs tests sequentially (single-threaded), so GVL blocking is invisible. The bug only manifests in applications using multiple Ruby threads.
Consequence:
A Rails server with worker threads, or any multi-threaded Ruby app calling extract_bytes_async(), experiences:
- Latency spikes for other threads
- Potential request timeouts
- Unpredictable performance under load
- Violation of Ruby idioms (async methods should never hold the GVL)
Fix Required:
Wrap async work with magnus::Ruby::release_gvl() or use Magnus's async bridge. The Alef generator needs to emit GVL-aware code for Ruby bindings.
Detailed Fix Specification:
For all async functions (extract_bytes_async, extract_file_async, batch_extract_files, batch_extract_files_async, batch_extract_bytes, batch_extract_bytes_async), change:
// BEFORE (GVL is held)
let rt = tokio::runtime::Runtime::new().map_err(|e| { ... })?;
let result = rt
.block_on(async { kreuzberg::extract_bytes(&content, &mime_type, &config_core).await })
.map_err(|e| { ... })?;
To:
// AFTER (GVL is released during I/O-bound work)
let ruby = unsafe { Ruby::get_unchecked() };
let (rt, result) = ruby.release_gvl(|| {
let rt = tokio::runtime::Runtime::new().map_err(|e| { ... })?;
let result = rt
.block_on(async { kreuzberg::extract_bytes(&content, &mime_type, &config_core).await })
.map_err(|e| { ... })?;
Ok((rt, result))
})?;
Ok(result.into())
Or, if Magnus provides an async trait wrapper:
// Alternative: use async trait methods
#[async_napi]
fn extract_bytes_async(...) -> Result<ExtractionResult, Error> { ... }
The fix requires:
- Patch Alef's Ruby codegen backend to wrap async function bodies with
release_gvl() - Regenerate ruby binding source with
task alef:generate - Recompile and re-test with e2e suite
- Add multi-threaded stress test to CI to prevent regression
Minor Findings
RBS Type Signatures
- ✅ Auto-generated from Rust source via Alef
- ✅ Comprehensive coverage (all 68 types)
- ✅ Sorbet-compatible interface syntax (
T::Helpers,interface!) - ✅
steep checkclean (no type checking failures in CI)
Magnus Type Conversions
- ✅ All TryConvert impls use safe
.ok()chains withunwrap_or_default()/unwrap_or_else() - ✅ No dangerous
.unwrap()or.expect()in type conversions - ✅ Proper error mapping to Ruby exceptions (
exception_runtime_error()) - ✅ JSON fallback in all TryConvert impls for flexible input
Exception Handling
- ✅ All errors converted to
magnus::Errorwithexception_runtime_error() - ✅ Error messages include context (e.g., "failed to deserialize AccelerationConfig: {}")
- ✅ No panics in binding code
Required Field Validation
- ✅
ExtractionResult.element_typeproperly marked as required, raisesArgErrorif missing - ✅ Other required fields (
metadatainElement) similarly validated
Rakefile & Build
- ✅ Multi-ABI cross-compilation configured (x86_64-linux, aarch64-linux, x86_64-darwin, arm64-darwin, x64-mingw32/ucrt)
- ✅ Native extension task properly isolated in
EXT_NATIVE_DIR - ✅
rake-compilerintegration correct for gem distribution
Data Structure Cloning
- All mutable fields properly cloned in getter methods:
- Strings:
self.field.clone() - Collections:
self.vec.clone(),self.hashmap.clone() - Avoids aliasing bugs in Ruby GC
- Strings:
E2E Test Coverage
- ✅ Error handling: empty input, invalid MIME, conflicting flags
- ✅ Batch operations: empty list, unsupported MIME, file not found, partial success
- ✅ Type conversions: all variants properly tested
Recommendations (v5 RC Cycle)
-
Fix GVL Release (MANDATORY)
- Patch Alef generator to wrap async Rust calls with
magnus::Ruby::release_gvl() - Regenerate all affected functions
- Re-run e2e tests to confirm no change in API behavior
- Patch Alef generator to wrap async Rust calls with
-
Add Multi-Threaded E2E Tests
- Create stress test spawning 10+ Ruby threads calling
extract_bytes_async() - Verify no deadlocks, no unexpected latency spikes
- Add to CI matrix for all supported Ruby versions (3.2, 3.3, 3.4)
- Create stress test spawning 10+ Ruby threads calling
-
Document GVL Semantics
- Clarify in Ruby docs:
extract_bytes_syncholds GVL,extract_bytes_asyncbriefly holds GVL during setup only - Add example: correct multi-threaded usage pattern
- Clarify in Ruby docs:
Test Results
Ruby e2e: 97 examples, 0 failures
├─ error_spec.rb: 5 tests (error handling)
├─ batch_spec.rb: 10 tests (batch operations)
├─ pdf_spec.rb: 8 tests
├─ html_spec.rb: 6 tests
├─ text_spec.rb: 7 tests
├─ email_spec.rb: 6 tests
├─ archive_spec.rb: 8 tests
├─ office_spec.rb: 10 tests
├─ image_spec.rb: 7 tests
├─ xml_spec.rb: 6 tests
├─ validator_management_spec.rb: 8 tests
└─ [remaining e2e specs]: 10 tests
Elapsed: 1.1s (files: 1.07s, tests: 0.03s)