10 KiB
Node.js/TypeScript Binding Audit
Audit Date: 2026-05-30 Status: In Progress
Overview
Systematic bug audit of packages/typescript/, crates/kreuzberg-node/, and e2e/node/. This document tracks identified issues, their severity, root causes, and fixes.
Key Files Examined
crates/kreuzberg-node/src/lib.rs(14,426 lines, auto-generated)crates/kreuzberg-node/index.d.ts(auto-generated type defs)crates/kreuzberg-node/index.js(simple pass-through loader)crates/kreuzberg-node/package.json(packaging metadata)e2e/node/tests/(generated test fixtures)
Issues Found
1. BINDING_BUG: Duplicate Function Declarations in .d.ts (CRITICAL)
Severity: HIGH
File: crates/kreuzberg-node/index.d.ts
Location: Lines 99-101 (and others)
Description: The generated .d.ts file contains duplicate function declarations for six registry management functions:
clearDocumentExtractors()(appears twice)clearEmbeddingBackends()(appears twice)clearOcrBackends()(appears twice)clearPostProcessors()(appears twice)clearRenderers()(appears twice)clearValidators()(appears twice)
Root Cause: Alef-generated code is emitting duplicate declarations, likely from a pre-commit or generation loop that processes trait-bridge exports multiple times.
Impact: TypeScript compilation may error or generate incorrect type information. IDEs may show duplicate suggestions.
Test Coverage: No e2e test validates type definition uniqueness.
Status: PENDING FIX (need to verify with alef CLI or regenerate)
2. ERROR_HANDLING: All Errors Mapped to GenericFailure
Severity: MEDIUM
File: crates/kreuzberg-node/src/lib.rs
Occurrences: 76 instances
Description: All Rust errors are converted to napi::Status::GenericFailure with only the error message preserved. This prevents proper categorization of errors on the JavaScript side.
Example:
.map_err(|e| napi::Error::new(napi::Status::GenericFailure, e.to_string()))
Missing Opportunities:
InvalidArgfor validation errorsInvalidDatafor parsing errorsObjectExpectedfor type mismatchesPendingExceptionfor async rejections
Impact: JavaScript callers cannot distinguish between file-not-found, unsupported-format, and internal errors without string parsing.
Test Coverage: e2e tests check error paths but don't validate error status codes.
Status: PENDING ANALYSIS (low priority if errors are contextual enough in messages)
3. TYPE_COERCION: i64 for Time/Size Fields
Severity: LOW
File: crates/kreuzberg-node/src/lib.rs
Occurrences: ~30 Option<i64> fields in config structs
Description: Timeout, cache TTL, archive size, and nesting depth fields use i64 instead of u32/u64.
Examples:
extraction_timeout_secs: Option<i64>cache_ttl_secs: Option<i64>max_archive_size: Option<i64>
Why It's Safe: All values are within Number.MAX_SAFE_INTEGER (2^53 - 1). No precision loss expected in practice.
Status: ACCEPTABLE (no action needed)
4. BUFFER_HANDLING: Vec Copies
Severity: LOW
File: crates/kreuzberg-node/src/lib.rs, lines 5878, 5971, 6139
Description: All Buffer inputs are converted to Vec<u8> via .to_vec(), which always copies the underlying data.
Code Pattern:
let content: Vec<u8> = content.to_vec();
kreuzberg::extract_bytes(&content, &mime_type, &config_core)
Trade-offs:
- Pro: Zero-copy would require unsafe lifetime transmission to Rust
- Con: Double memory usage for large files (Buffer + Vec)
- Acceptable: Node.js handles garbage collection; trade-off is reasonable for simplicity
Status: ACCEPTABLE (safe ownership semantics)
5. ASYNC_HANDLING: Global Tokio Runtime
Severity: LOW (design choice)
File: crates/kreuzberg-node/src/lib.rs, lines 54-59
Description: A static WORKER_POOL is initialized as a global Tokio runtime for both async and sync functions.
Pattern:
static WORKER_POOL: std::sync::LazyLock<tokio::runtime::Runtime> = ...
Correctness:
async fnfunctions likeextract_bytes()are directly exposed via NAPI and return Promises ✓- Sync functions use
WORKER_POOL.block_on()to bridge to async Rust ✓ - No blocking on event loop (sync functions use dedicated thread pool) ✓
Potential Concern: If called from too many concurrent contexts, thread pool could be saturated. Mitigated by reasonable defaults in kreuzberg core.
Status: ACCEPTABLE (standard pattern for NAPI-RS)
6. EMBEDDING_PRECISION: f32 to f64 Conversion
Severity: LOW (intentional)
File: crates/kreuzberg-node/src/lib.rs, line 6319
Description: Rust core returns Vec<Vec<f32>> embeddings, but Node binding promotes them to Vec<Vec<f64>> before returning to JavaScript.
Code:
row.into_iter().map(|x| x as f64).collect::<Vec<_>>()
Rationale: JavaScript uses IEEE 754 f64 natively; promoting f32→f64 simplifies client code and avoids typed array overhead.
Impact: Zero precision loss (f32 fits exactly in f64 mantissa). Slight memory overhead (2x per embedding vector).
Status: ACCEPTABLE (intentional design)
7. TYPE_DEFINITIONS: JSDoc Parity
Severity: LOW
File: crates/kreuzberg-node/index.d.ts
Description: TypeScript docs use legacy rustdoc syntax ([...] links, : in param names) instead of JSDoc/TSDoc syntax.
Examples:
// Generated (rustdoc):
@param items - Vector of `BatchBytesItem` structs, ...
@returns A vector of `ExtractionResult` in ...
// Expected (TSDoc):
@param {Array<BatchBytesItem>} items - Vector of byte items
@returns {Promise<Array<ExtractionResult>>} Result vector
Impact: IDEs with strict JSDoc checkers may warn. Auto-docs generators expect standard JSDoc format.
Status: ALEF_GAP (generator produces rustdoc-style comments, not JSDoc)
8. TRAIT_BRIDGE: Object Lifetime Safety
Severity: LOW (well-handled)
File: crates/kreuzberg-node/src/lib.rs, lines 138-164 (JsVisitorRef), 6679-6711 (JsPostProcessorBridge)
Description: Trait bridge wrappers use Object<'static> transmute to store JS objects across async boundaries.
Code Pattern:
let js_obj: napi::bindgen_prelude::Object<'static> = unsafe { std::mem::transmute(js_obj) };
Safety Justification (from comments):
- JS object is owned by Node.js runtime
- Bridge is only used synchronously within the enclosing
#[napi]call
Correctness: ✓ (lifetime is safe for trait dispatch)
Status: ACCEPTABLE (unsafe is justified and documented)
Audit Checklist
- NAPI-RS signature drift — all main functions checked
- Promise rejection paths — async functions verified
- Buffer/Vec ownership — safe conversions confirmed
- BigInt vs Number — all large values stay within safe range
- Event-loop blocking — sync functions use runtime.block_on()
- .d.ts parity — found duplicate declarations (BUG)
- TSDoc/JSDoc parity — rustdoc syntax used (alef gap)
- Error status codes — all use GenericFailure (improvement opportunity)
Duplicate Functions in .d.ts - Full List
These 6 functions have duplicate declarations in index.d.ts:
clearDocumentExtractors()— lines 91, 93clearEmbeddingBackends()— lines 87, 89clearOcrBackends()— lines 99, 101clearPostProcessors()— lines 107, 109clearRenderers()— lines 103, 105clearValidators()— lines 95, 97
Action Item: Regenerate with alef generate to fix (not permitted in this audit).
Additional Findings
9. CONFIG_CONVERSION: Field Serialization
Severity: LOW
File: crates/kreuzberg-node/src/lib.rs, lines 8061, 8075, 8079
Description: Fields html_options, concurrency, and cancel_token are serialized as format!("{:?}") because they contain complex Rust types that cannot serialize to JSON.
Impact: These fields are read-only on the JS side and return debug representations. Acceptable for internal use but not user-facing config.
Status: ACCEPTABLE (design choice for internal fields)
E2E Test Status
Tests are currently building (napi build running). The e2e suite is comprehensive with 20+ test files covering:
- Async/sync operations
- Batch processing
- Plugin APIs (OCR, embeddings, document extractor)
- Configuration contracts
- Error handling
- Format-specific extraction
- MIME type detection
Current Green Status: e2e/node last ran successfully (pre-audit).
Summary of Findings
| Issue | Severity | Status | Action |
|---|---|---|---|
| Duplicate .d.ts declarations (6 functions) | HIGH | Confirmed | Regenerate with alef |
| All errors map to GenericFailure | MEDIUM | As-designed | Optional improvement |
| JSDoc syntax in comments | LOW | Alef gap | Upstream fix needed |
| Buffer double-copy on input | LOW | Acceptable | No fix needed |
| Config debug fields | LOW | Acceptable | No fix needed |
Recommendations
-
Critical: Fix duplicate .d.ts declarations
- Regenerate with
alef generate - File:
crates/kreuzberg-node/index.d.ts - Affects:
clearDocumentExtractors,clearEmbeddingBackends,clearOcrBackends,clearPostProcessors,clearRenderers,clearValidators
- Regenerate with
-
Nice to Have: Map specific Rust errors to appropriate
napi::Statuscodes- Currently all use
GenericFailure - Consider:
InvalidArgfor validation,InvalidDatafor parsing - Benefit: Better error categorization on JS side
- Currently all use
-
Documentation: Update Alef to generate JSDoc-compliant comments
- Current: rustdoc syntax (
[...] links) - Target: TSDoc format with
@param,@returnstags - Affected files:
.d.tsfile comments
- Current: rustdoc syntax (
-
Verification: Run TypeScript strict checking
- Command:
cd e2e/node && tsc --noEmit - Ensure
.d.tshas no duplicates and proper types
- Command:
What Went Right
- ✓ NAPI-RS signatures correctly expose kreuzberg core API
- ✓ Async functions properly return Promises
- ✓ Sync functions use Tokio runtime correctly (no event loop blocking)
- ✓ Buffer ownership properly transferred (no leaks)
- ✓ Trait bridges safely transmute Object<'static> with documented SAFETY
- ✓ Embedding precision (f32→f64) intentional and documented
- ✓ Config conversion comprehensive, all fields mapped
- ✓ Error messages preserve context via
.to_string()