# Node.js/TypeScript Binding Audit **Audit Date**: 2026-05-30 **Status**: In Progress ## Overview Systematic bug audit of `packages/typescript/`, `crates/kreuzberg-node/`, and `e2e/node/`. This document tracks identified issues, their severity, root causes, and fixes. ## Key Files Examined - `crates/kreuzberg-node/src/lib.rs` (14,426 lines, auto-generated) - `crates/kreuzberg-node/index.d.ts` (auto-generated type defs) - `crates/kreuzberg-node/index.js` (simple pass-through loader) - `crates/kreuzberg-node/package.json` (packaging metadata) - `e2e/node/tests/` (generated test fixtures) ## Issues Found ### 1. BINDING_BUG: Duplicate Function Declarations in .d.ts (CRITICAL) **Severity**: HIGH **File**: `crates/kreuzberg-node/index.d.ts` **Location**: Lines 99-101 (and others) **Description**: The generated `.d.ts` file contains duplicate function declarations for six registry management functions: - `clearDocumentExtractors()` (appears twice) - `clearEmbeddingBackends()` (appears twice) - `clearOcrBackends()` (appears twice) - `clearPostProcessors()` (appears twice) - `clearRenderers()` (appears twice) - `clearValidators()` (appears twice) **Root Cause**: Alef-generated code is emitting duplicate declarations, likely from a pre-commit or generation loop that processes trait-bridge exports multiple times. **Impact**: TypeScript compilation may error or generate incorrect type information. IDEs may show duplicate suggestions. **Test Coverage**: No e2e test validates type definition uniqueness. **Status**: PENDING FIX (need to verify with alef CLI or regenerate) ### 2. ERROR_HANDLING: All Errors Mapped to GenericFailure **Severity**: MEDIUM **File**: `crates/kreuzberg-node/src/lib.rs` **Occurrences**: 76 instances **Description**: All Rust errors are converted to `napi::Status::GenericFailure` with only the error message preserved. This prevents proper categorization of errors on the JavaScript side. **Example**: ```rust .map_err(|e| napi::Error::new(napi::Status::GenericFailure, e.to_string())) ``` **Missing Opportunities**: - `InvalidArg` for validation errors - `InvalidData` for parsing errors - `ObjectExpected` for type mismatches - `PendingException` for async rejections **Impact**: JavaScript callers cannot distinguish between file-not-found, unsupported-format, and internal errors without string parsing. **Test Coverage**: e2e tests check error paths but don't validate error status codes. **Status**: PENDING ANALYSIS (low priority if errors are contextual enough in messages) ### 3. TYPE_COERCION: i64 for Time/Size Fields **Severity**: LOW **File**: `crates/kreuzberg-node/src/lib.rs` **Occurrences**: ~30 `Option` fields in config structs **Description**: Timeout, cache TTL, archive size, and nesting depth fields use `i64` instead of `u32`/`u64`. **Examples**: - `extraction_timeout_secs: Option` - `cache_ttl_secs: Option` - `max_archive_size: Option` **Why It's Safe**: All values are within `Number.MAX_SAFE_INTEGER` (2^53 - 1). No precision loss expected in practice. **Status**: ACCEPTABLE (no action needed) ### 4. BUFFER_HANDLING: Vec Copies **Severity**: LOW **File**: `crates/kreuzberg-node/src/lib.rs`, lines 5878, 5971, 6139 **Description**: All Buffer inputs are converted to `Vec` via `.to_vec()`, which always copies the underlying data. **Code Pattern**: ```rust let content: Vec = content.to_vec(); kreuzberg::extract_bytes(&content, &mime_type, &config_core) ``` **Trade-offs**: - **Pro**: Zero-copy would require unsafe lifetime transmission to Rust - **Con**: Double memory usage for large files (Buffer + Vec) - **Acceptable**: Node.js handles garbage collection; trade-off is reasonable for simplicity **Status**: ACCEPTABLE (safe ownership semantics) ### 5. ASYNC_HANDLING: Global Tokio Runtime **Severity**: LOW (design choice) **File**: `crates/kreuzberg-node/src/lib.rs`, lines 54-59 **Description**: A static `WORKER_POOL` is initialized as a global Tokio runtime for both async and sync functions. **Pattern**: ```rust static WORKER_POOL: std::sync::LazyLock = ... ``` **Correctness**: - `async fn` functions like `extract_bytes()` are directly exposed via NAPI and return Promises ✓ - Sync functions use `WORKER_POOL.block_on()` to bridge to async Rust ✓ - No blocking on event loop (sync functions use dedicated thread pool) ✓ **Potential Concern**: If called from too many concurrent contexts, thread pool could be saturated. Mitigated by reasonable defaults in kreuzberg core. **Status**: ACCEPTABLE (standard pattern for NAPI-RS) ### 6. EMBEDDING_PRECISION: f32 to f64 Conversion **Severity**: LOW (intentional) **File**: `crates/kreuzberg-node/src/lib.rs`, line 6319 **Description**: Rust core returns `Vec>` embeddings, but Node binding promotes them to `Vec>` before returning to JavaScript. **Code**: ```rust row.into_iter().map(|x| x as f64).collect::>() ``` **Rationale**: JavaScript uses IEEE 754 f64 natively; promoting f32→f64 simplifies client code and avoids typed array overhead. **Impact**: Zero precision loss (f32 fits exactly in f64 mantissa). Slight memory overhead (2x per embedding vector). **Status**: ACCEPTABLE (intentional design) ### 7. TYPE_DEFINITIONS: JSDoc Parity **Severity**: LOW **File**: `crates/kreuzberg-node/index.d.ts` **Description**: TypeScript docs use legacy rustdoc syntax (`[...] links`, `:` in param names) instead of JSDoc/TSDoc syntax. **Examples**: ```typescript // Generated (rustdoc): @param items - Vector of `BatchBytesItem` structs, ... @returns A vector of `ExtractionResult` in ... // Expected (TSDoc): @param {Array} items - Vector of byte items @returns {Promise>} Result vector ``` **Impact**: IDEs with strict JSDoc checkers may warn. Auto-docs generators expect standard JSDoc format. **Status**: ALEF_GAP (generator produces rustdoc-style comments, not JSDoc) ### 8. TRAIT_BRIDGE: Object Lifetime Safety **Severity**: LOW (well-handled) **File**: `crates/kreuzberg-node/src/lib.rs`, lines 138-164 (JsVisitorRef), 6679-6711 (JsPostProcessorBridge) **Description**: Trait bridge wrappers use `Object<'static>` transmute to store JS objects across async boundaries. **Code Pattern**: ```rust let js_obj: napi::bindgen_prelude::Object<'static> = unsafe { std::mem::transmute(js_obj) }; ``` **Safety Justification** (from comments): - JS object is owned by Node.js runtime - Bridge is only used synchronously within the enclosing `#[napi]` call **Correctness**: ✓ (lifetime is safe for trait dispatch) **Status**: ACCEPTABLE (unsafe is justified and documented) ## Audit Checklist - [x] NAPI-RS signature drift — all main functions checked - [x] Promise rejection paths — async functions verified - [x] Buffer/Vec ownership — safe conversions confirmed - [x] BigInt vs Number — all large values stay within safe range - [x] Event-loop blocking — sync functions use runtime.block_on() - [x] .d.ts parity — found duplicate declarations (BUG) - [x] TSDoc/JSDoc parity — rustdoc syntax used (alef gap) - [x] Error status codes — all use GenericFailure (improvement opportunity) ## Duplicate Functions in .d.ts - Full List These 6 functions have duplicate declarations in `index.d.ts`: 1. `clearDocumentExtractors()` — lines 91, 93 2. `clearEmbeddingBackends()` — lines 87, 89 3. `clearOcrBackends()` — lines 99, 101 4. `clearPostProcessors()` — lines 107, 109 5. `clearRenderers()` — lines 103, 105 6. `clearValidators()` — lines 95, 97 **Action Item**: Regenerate with `alef generate` to fix (not permitted in this audit). ## Additional Findings ### 9. CONFIG_CONVERSION: Field Serialization **Severity**: LOW **File**: `crates/kreuzberg-node/src/lib.rs`, lines 8061, 8075, 8079 **Description**: Fields `html_options`, `concurrency`, and `cancel_token` are serialized as `format!("{:?}")` because they contain complex Rust types that cannot serialize to JSON. **Impact**: These fields are read-only on the JS side and return debug representations. Acceptable for internal use but not user-facing config. **Status**: ACCEPTABLE (design choice for internal fields) ## E2E Test Status Tests are currently building (napi build running). The e2e suite is comprehensive with 20+ test files covering: - Async/sync operations - Batch processing - Plugin APIs (OCR, embeddings, document extractor) - Configuration contracts - Error handling - Format-specific extraction - MIME type detection **Current Green Status**: e2e/node last ran successfully (pre-audit). ## Summary of Findings | Issue | Severity | Status | Action | |-------|----------|--------|--------| | Duplicate .d.ts declarations (6 functions) | HIGH | Confirmed | Regenerate with alef | | All errors map to GenericFailure | MEDIUM | As-designed | Optional improvement | | JSDoc syntax in comments | LOW | Alef gap | Upstream fix needed | | Buffer double-copy on input | LOW | Acceptable | No fix needed | | Config debug fields | LOW | Acceptable | No fix needed | ## Recommendations 1. **Critical**: Fix duplicate .d.ts declarations - Regenerate with `alef generate` - File: `crates/kreuzberg-node/index.d.ts` - Affects: `clearDocumentExtractors`, `clearEmbeddingBackends`, `clearOcrBackends`, `clearPostProcessors`, `clearRenderers`, `clearValidators` 2. **Nice to Have**: Map specific Rust errors to appropriate `napi::Status` codes - Currently all use `GenericFailure` - Consider: `InvalidArg` for validation, `InvalidData` for parsing - Benefit: Better error categorization on JS side 3. **Documentation**: Update Alef to generate JSDoc-compliant comments - Current: rustdoc syntax (`[...] links`) - Target: TSDoc format with `@param`, `@returns` tags - Affected files: `.d.ts` file comments 4. **Verification**: Run TypeScript strict checking - Command: `cd e2e/node && tsc --noEmit` - Ensure `.d.ts` has no duplicates and proper types ## What Went Right - ✓ NAPI-RS signatures correctly expose kreuzberg core API - ✓ Async functions properly return Promises - ✓ Sync functions use Tokio runtime correctly (no event loop blocking) - ✓ Buffer ownership properly transferred (no leaks) - ✓ Trait bridges safely transmute Object<'static> with documented SAFETY - ✓ Embedding precision (f32→f64) intentional and documented - ✓ Config conversion comprehensive, all fields mapped - ✓ Error messages preserve context via `.to_string()`