Nomad changes
All checks were successful
Deploy fil (kreuzberg) / deploy (push) Successful in 49s

This commit is contained in:
Henrik Jess Nielsen
2026-06-01 23:40:55 +02:00
parent 72b1a0a6ed
commit b4c07d3693
5723 changed files with 1130655 additions and 0 deletions

32
crates/kreuzberg-node/Cargo.toml generated Normal file
View File

@@ -0,0 +1,32 @@
[package]
name = "kreuzberg-node"
version = "5.0.0-rc.3"
edition = "2024"
license = "Elastic-2.0"
description = "High-performance document intelligence library"
readme = false
keywords = ["document", "extraction", "ocr", "pdf", "text"]
categories = ["text-processing"]
# `serde_json` is emitted unconditionally above so the manifest is stable
# across regens, but for umbrella crates with no JSON-marshalled return types
# it is genuinely unused. The conditional `async-trait` / `futures-util` deps
# are similarly flagged when the umbrella has trait-bridge / streaming
# adapters configured but no actual async-trait callsite in this binding.
[package.metadata.cargo-machete]
ignored = ["serde_json", "async-trait"]
[lib]
crate-type = ["cdylib"]
[dependencies]
async-trait = "0.1"
kreuzberg = { version = "5.0.0-rc.3", path = "../kreuzberg", features = ["full", "pdf", "ocr", "paddle-ocr", "paddle-ocr-types", "layout-detection", "layout-types", "embeddings", "embedding-presets", "chunking", "keywords-yake", "keywords-rake", "language-detection", "html", "tree-sitter", "office", "email", "archives", "stopwords", "auto-rotate", "auto-rotate-types", "tokio-runtime", "api", "mcp", "liter-llm", "quality"] }
napi = { version = "3", features = ["async", "serde-json"] }
napi-derive = "3"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
serde_with = "3"
[build-dependencies]
napi-build = "2"

93
crates/kreuzberg-node/LICENSE generated Normal file
View File

@@ -0,0 +1,93 @@
Elastic License 2.0 (ELv2)
Copyright 2025-2026 Kreuzberg, Inc.
Acceptance
By using the software, you agree to all of the terms and conditions below.
Copyright License
The licensor grants you a non-exclusive, royalty-free, worldwide,
non-sublicensable, non-transferable license to use, copy, distribute, make
available, and prepare derivative works of the software, in each case subject to
the limitations and conditions below.
Limitations
You may not provide the software to third parties as a hosted or managed
service, where the service provides users with access to any substantial set of
the features or functionality of the software.
You may not move, change, disable, or circumvent the license key functionality
in the software, and you may not remove or obscure any functionality in the
software that is protected by the license key.
You may not alter, remove, or obscure any licensing, copyright, or other notices
of the licensor in the software. Any use of the licensor's trademarks is subject
to applicable law.
Patents
The licensor grants you a license, under any patent claims the licensor can
license, or becomes able to license, to make, have made, use, sell, offer for
sale, import and have imported the software, in each case subject to the
limitations and conditions in this license. This license does not cover any
patent claims that you cause to be infringed by modifications or additions to the
software. If you or your company make any written claim that the software
infringes or contributes to infringement of any patent, your patent license for
the software granted under these terms ends immediately. If your company makes
such a claim, your patent license ends immediately for work on behalf of your
company.
Notices
You must ensure that anyone who gets a copy of any part of the software from you
also gets a copy of these terms.
If you modify the software, you must include in any modified copies of the
software prominent notices stating that you have modified the software.
No Other Rights
These terms do not imply any licenses other than those expressly granted in
these terms.
Termination
If you use the software in violation of these terms, such use is not licensed,
and your licenses will automatically terminate. If the licensor provides you with
a notice of your violation, and you cease all violation of this license no later
than 30 days after you receive that notice, your licenses will be reinstated
retroactively. However, if you violate these terms after such reinstatement, any
additional violation of these terms will cause your licenses to terminate
automatically and permanently.
No Liability
As far as the law allows, the software comes as is, without any warranty or
condition, and the licensor will not be liable to you for any damages arising out
of these terms or the use or nature of the software, under any kind of legal
claim.
Definitions
The licensor is the entity offering these terms, and the software is the
software the licensor makes available under these terms, including any portion
of it.
you refers to the individual or entity agreeing to these terms.
your company is any legal entity, sole proprietorship, or other kind of
organization that you work for, plus all organizations that have control over,
are under the control of, or are under common control with that organization.
control means ownership of substantially all the assets of an entity, or the
power to direct its management and policies by vote, contract, or otherwise.
Control can be direct or indirect.
your licenses are all the licenses granted to you for the software under these
terms.
use means anything you do with the software requiring one of your licenses.
trademark means trademarks, service marks, and similar rights.

488
crates/kreuzberg-node/README.md generated Normal file
View File

@@ -0,0 +1,488 @@
# TypeScript (Node.js)
<div align="center" style="display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;">
<a href="https://github.com/kreuzberg-dev/alef">
<img src="https://img.shields.io/badge/Bindings-alef%20%D7%90-007ec6" alt="Bindings">
</a>
<!-- Language Bindings -->
<a href="https://crates.io/crates/kreuzberg">
<img src="https://img.shields.io/crates/v/kreuzberg?label=Rust&color=007ec6" alt="Rust">
</a>
<a href="https://pypi.org/project/kreuzberg/">
<img src="https://img.shields.io/pypi/v/kreuzberg?label=Python&color=007ec6" alt="Python">
</a>
<a href="https://www.npmjs.com/package/@kreuzberg/node">
<img src="https://img.shields.io/npm/v/@kreuzberg/node?label=Node.js&color=007ec6" alt="Node.js">
</a>
<a href="https://www.npmjs.com/package/@kreuzberg/wasm">
<img src="https://img.shields.io/npm/v/@kreuzberg/wasm?label=WASM&color=007ec6" alt="WASM">
</a>
<a href="https://central.sonatype.com/artifact/dev.kreuzberg/kreuzberg">
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg?label=Java&color=007ec6" alt="Java">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/go/v5">
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/kreuzberg?label=Go&color=007ec6&filter=v5*" alt="Go">
</a>
<a href="https://www.nuget.org/packages/Kreuzberg/">
<img src="https://img.shields.io/nuget/v/Kreuzberg?label=C%23&color=007ec6" alt="C#">
</a>
<a href="https://packagist.org/packages/kreuzberg/kreuzberg">
<img src="https://img.shields.io/packagist/v/kreuzberg/kreuzberg?label=PHP&color=007ec6" alt="PHP">
</a>
<a href="https://rubygems.org/gems/kreuzberg">
<img src="https://img.shields.io/gem/v/kreuzberg?label=Ruby&color=007ec6" alt="Ruby">
</a>
<a href="https://hex.pm/packages/kreuzberg">
<img src="https://img.shields.io/hexpm/v/kreuzberg?label=Elixir&color=007ec6" alt="Elixir">
</a>
<a href="https://kreuzberg-dev.r-universe.dev/kreuzberg">
<img src="https://img.shields.io/badge/R-kreuzberg-007ec6" alt="R">
</a>
<a href="https://pub.dev/packages/kreuzberg">
<img src="https://img.shields.io/pub/v/kreuzberg?label=Dart&color=007ec6" alt="Dart">
</a>
<a href="https://central.sonatype.com/artifact/dev.kreuzberg/kreuzberg-android">
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/kreuzberg-android?label=Kotlin&color=007ec6" alt="Kotlin">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/swift">
<img src="https://img.shields.io/badge/Swift-SPM-007ec6" alt="Swift">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/zig">
<img src="https://img.shields.io/badge/Zig-package-007ec6" alt="Zig">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/releases">
<img src="https://img.shields.io/badge/C-FFI-007ec6" alt="C FFI">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/pkgs/container/kreuzberg">
<img src="https://img.shields.io/badge/Docker-ghcr.io-007ec6?logo=docker&logoColor=white" alt="Docker">
</a>
<a href="https://github.com/kreuzberg-dev/kreuzberg/pkgs/container/charts%2Fkreuzberg">
<img src="https://img.shields.io/badge/Helm-ghcr.io-007ec6?logo=helm&logoColor=white" alt="Helm">
</a>
<!-- Project Info -->
<a href="https://github.com/kreuzberg-dev/kreuzberg/blob/main/LICENSE">
<img src="https://img.shields.io/badge/License-Elastic--2.0-007ec6" alt="License">
</a>
<a href="https://docs.kreuzberg.dev">
<img src="https://img.shields.io/badge/Docs-kreuzberg-007ec6" alt="Documentation">
</a>
<a href="https://huggingface.co/Kreuzberg">
<img src="https://img.shields.io/badge/Hugging%20Face-Kreuzberg-007ec6" alt="Hugging Face">
</a>
</div>
<div align="center" style="margin: 24px 0 0;">
<a href="https://kreuzberg.dev">
<img alt="Kreuzberg" src="https://github.com/user-attachments/assets/419fc06c-8313-4324-b159-4b4d3cfce5c0" />
</a>
</div>
<div align="center" style="display: flex; flex-wrap: wrap; gap: 12px; justify-content: center; margin: 28px 0 24px;">
<a href="https://discord.gg/xt9WY3GnKR">
<img height="22" src="https://img.shields.io/badge/Discord-Chat-007ec6?logo=discord&logoColor=white" alt="Join Discord">
</a>
<a href="https://docs.kreuzberg.dev/demo.html">
<img height="22" src="https://img.shields.io/badge/Live%20Demo-Open-007ec6?logo=webassembly&logoColor=white" alt="Live Demo">
</a>
</div>
Extract text, tables, images, and metadata from 90+ file formats and 300+ programming languages including PDF, Office documents, and images. Native NAPI-RS bindings for Node.js with superior performance, async/await support, and TypeScript type definitions.
## What This Package Provides
- **Document intelligence core** — extract text, tables, images, metadata, entities, keywords, and code intelligence from one API.
- **Format coverage** — PDF, Office, images, HTML/XML, email, archives, notebooks, citations, scientific formats, and plain text.
- **OCR choices** — Tesseract, PaddleOCR, EasyOCR where supported, VLM OCR through liter-llm, and plugin hooks for custom backends.
- **Same engine as every binding** — Rust, Python, Node.js, Go, Java, PHP, Ruby, .NET, Elixir, R, WASM, Kotlin Android, Swift, Dart, Zig, and C FFI share the same Rust implementation.
- **Node-first TypeScript API** — NAPI-RS package with typed options/results and async extraction.
## Installation
### Package Installation
```bash
pnpm add @kreuzberg/node
```
### System Requirements
- **Node.js 22+** required (NAPI-RS native bindings)
- Optional: [ONNX Runtime](https://github.com/microsoft/onnxruntime/releases) version 1.22.x for embeddings support
- Optional: [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) for OCR functionality
### Platform Support
Pre-built binaries available for:
- macOS (arm64, x64)
- Linux (x64)
- Windows (x64)
## Quick Start
### Basic Extraction
Extract text, metadata, and structure from any supported document format:
```typescript title="TypeScript"
import { extractFileSync } from "@kreuzberg/node";
const config = {
useCache: true,
enableQualityProcessing: true,
};
const result = extractFileSync("document.pdf", null, config);
console.log(result.content);
console.log(`MIME Type: ${result.mimeType}`);
```
### Common Use Cases
#### Extract with Custom Configuration
Most use cases benefit from configuration to control extraction behavior:
**With OCR (for scanned documents):**
```typescript title="TypeScript"
import { extractFile } from "@kreuzberg/node";
const config = {
ocr: {
backend: "tesseract",
language: "eng+fra",
tesseractConfig: {
psm: 3,
},
},
};
const result = await extractFile("document.pdf", null, config);
console.log(result.content);
```
#### Table Extraction
```typescript title="TypeScript"
import { extractFileSync } from "kreuzberg";
const result = extractFileSync("document.pdf");
result.tables?.forEach((table) => {
console.log(`Table with ${table.cells?.length ?? 0} rows`);
console.log(table.markdown);
table.cells?.forEach((row) => console.log(row.join(" | ")));
});
```
#### Processing Multiple Files
```typescript title="TypeScript"
import { batchExtractFilesSync } from "@kreuzberg/node";
const files = ["doc1.pdf", "doc2.docx", "doc3.pptx"];
const results = batchExtractFilesSync(files);
results.forEach((result, i) => {
console.log(`File ${i + 1}: ${result.content.length} characters`);
});
```
#### Async Processing
For non-blocking document processing:
```typescript title="TypeScript"
import { extractFile } from "@kreuzberg/node";
const result = await extractFile("document.pdf");
console.log(result.content);
```
#### Configuration Discovery
```typescript title="config_discovery.ts"
import { ExtractionConfig, extractFile } from "@kreuzberg/node";
const config = ExtractionConfig.discover();
if (config) {
console.log("Found configuration file");
const result = await extractFile("document.pdf", null, config);
console.log(result.content);
} else {
console.log("No configuration file found, using defaults");
const result = await extractFile("document.pdf");
console.log(result.content);
}
```
#### Worker Thread Pool
```typescript title="worker_pool.ts"
import {
createWorkerPool,
extractFileInWorker,
batchExtractFilesInWorker,
closeWorkerPool,
} from "@kreuzberg/node";
// Create a pool with 4 worker threads
const pool = createWorkerPool(4);
try {
// Extract single file in worker
const result = await extractFileInWorker(pool, "document.pdf", null, {
useCache: true,
});
console.log(result.content);
// Extract multiple files concurrently
const files = ["doc1.pdf", "doc2.docx", "doc3.xlsx"];
const results = await batchExtractFilesInWorker(pool, files, {
useCache: true,
});
results.forEach((result, i) => {
console.log(`File ${i + 1}: ${result.content.length} characters`);
});
} finally {
// Always close the pool when done
await closeWorkerPool(pool);
}
```
**Performance Benefits:**
- **Parallel Processing**: Multiple documents extracted simultaneously
- **CPU Utilization**: Maximizes multi-core CPU usage for large batches
- **Queue Management**: Automatically distributes work across available workers
- **Resource Control**: Prevents thread exhaustion with configurable pool size
**Best Practices:**
- Use worker pools for batches of 10+ documents
- Set pool size to number of CPU cores (default behavior)
- Always close pools with `closeWorkerPool()` to prevent resource leaks
- Reuse pools across multiple batch operations for efficiency
### Next Steps
- **[Installation Guide](https://docs.kreuzberg.dev/getting-started/installation/)** - Platform-specific setup
- **[API Documentation](https://docs.kreuzberg.dev/reference/api-python/)** - Complete API reference
- **[Examples & Guides](https://docs.kreuzberg.dev/)** - Full code examples and usage guides
- **[Configuration Guide](https://docs.kreuzberg.dev/guides/configuration/)** - Advanced configuration options
## NAPI-RS Implementation Details
### Native Performance
This binding uses NAPI-RS to provide native Node.js bindings with:
- **Zero-copy data transfer** between JavaScript and Rust layers
- **Native thread pool** for concurrent document processing
- **Direct memory management** for efficient large document handling
- **Binary-compatible** pre-built native modules across platforms
### Threading Model
- Single documents are processed synchronously or asynchronously in a dedicated thread
- Batch operations distribute work across available CPU cores
- Thread count is configurable but defaults to system CPU count
- Long-running extractions block the event loop unless using async APIs
### Memory Management
- Large documents (> 100 MB) are streamed to avoid loading entirely into memory
- Temporary files are created in system temp directory for extraction
- Memory is automatically released after extraction completion
- ONNX models are cached in memory for repeated embeddings operations
## Features
### Supported File Formats (90+)
90+ file formats across 8 major categories with intelligent format detection and comprehensive metadata extraction.
#### Office Documents
| Category | Formats | Capabilities |
|----------|---------|--------------|
| **Word Processing** | `.docx`, `.docm`, `.dotx`, `.dotm`, `.dot`, `.odt` | Full text, tables, images, metadata, styles |
| **Spreadsheets** | `.xlsx`, `.xlsm`, `.xlsb`, `.xls`, `.xla`, `.xlam`, `.xltm`, `.xltx`, `.xlt`, `.ods` | Sheet data, formulas, cell metadata, charts |
| **Presentations** | `.pptx`, `.pptm`, `.ppsx`, `.potx`, `.potm`, `.pot`, `.ppt` | Slides, speaker notes, images, metadata |
| **PDF** | `.pdf` | Text, tables, images, metadata, OCR support |
| **eBooks** | `.epub`, `.fb2` | Chapters, metadata, embedded resources |
| **Database** | `.dbf` | Table data extraction, field type support |
| **Hangul** | `.hwp`, `.hwpx` | Korean document format, text extraction |
#### Images (OCR-Enabled)
| Category | Formats | Features |
|----------|---------|----------|
| **Raster** | `.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`, `.bmp`, `.tiff`, `.tif` | OCR, table detection, EXIF metadata, dimensions, color space |
| **Advanced** | `.jp2`, `.jpx`, `.jpm`, `.mj2`, `.jbig2`, `.jb2`, `.pnm`, `.pbm`, `.pgm`, `.ppm` | OCR via hayro-jpeg2000 (pure Rust decoder), JBIG2 support, table detection, format-specific metadata |
| **Vector** | `.svg` | DOM parsing, embedded text, graphics metadata |
#### Web & Data
| Category | Formats | Features |
|----------|---------|----------|
| **Markup** | `.html`, `.htm`, `.xhtml`, `.xml`, `.svg` | DOM parsing, metadata (Open Graph, Twitter Card), link extraction |
| **Structured Data** | `.json`, `.yaml`, `.yml`, `.toml`, `.csv`, `.tsv` | Schema detection, nested structures, validation |
| **Text & Markdown** | `.txt`, `.md`, `.markdown`, `.djot`, `.rst`, `.org`, `.rtf` | CommonMark, GFM, Djot, reStructuredText, Org Mode |
#### Email & Archives
| Category | Formats | Features |
|----------|---------|----------|
| **Email** | `.eml`, `.msg` | Headers, body (HTML/plain), attachments, threading |
| **Archives** | `.zip`, `.tar`, `.tgz`, `.gz`, `.7z` | File listing, nested archives, metadata |
#### Academic & Scientific
| Category | Formats | Features |
|----------|---------|----------|
| **Citations** | `.bib`, `.biblatex`, `.ris`, `.nbib`, `.enw`, `.csl` | Structured parsing: RIS (structured), PubMed/MEDLINE, EndNote XML (structured), BibTeX, CSL JSON |
| **Scientific** | `.tex`, `.latex`, `.typst`, `.jats`, `.ipynb`, `.docbook` | LaTeX, Jupyter notebooks, PubMed JATS |
| **Documentation** | `.opml`, `.pod`, `.mdoc`, `.troff` | Technical documentation formats |
#### Code Intelligence (300+ Languages)
| Feature | Description |
|---------|-------------|
| **Structure Extraction** | Functions, classes, methods, structs, interfaces, enums |
| **Import/Export Analysis** | Module dependencies, re-exports, wildcard imports |
| **Symbol Extraction** | Variables, constants, type aliases, properties |
| **Docstring Parsing** | Google, NumPy, Sphinx, JSDoc, RustDoc, and 10+ formats |
| **Diagnostics** | Parse errors with line/column positions |
| **Syntax-Aware Chunking** | Split code by semantic boundaries, not arbitrary byte offsets |
Powered by [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — [documentation](https://docs.tree-sitter-language-pack.kreuzberg.dev).
**[Complete Format Reference](https://docs.kreuzberg.dev/reference/formats/)**
### Key Capabilities
- **Text Extraction** - Extract all text content with position and formatting information
- **Metadata Extraction** - Retrieve document properties, creation date, author, etc.
- **Table Extraction** - Parse tables with structure and cell content preservation
- **Image Extraction** - Extract embedded images and render page previews
- **OCR Support** - Integrate multiple OCR backends for scanned documents
- **Async/Await** - Non-blocking document processing with concurrent operations
- **Plugin System** - Extensible post-processing for custom text transformation
- **Embeddings** - Generate vector embeddings using ONNX Runtime models
- **Batch Processing** - Efficiently process multiple documents in parallel
- **Memory Efficient** - Stream large files without loading entirely into memory
- **Language Detection** - Detect and support multiple languages in documents
- **Code Intelligence** - Extract structure, imports, exports, symbols, and docstrings from [300+ programming languages](https://docs.tree-sitter-language-pack.kreuzberg.dev) via tree-sitter
- **Configuration** - Fine-grained control over extraction behavior
### Performance Characteristics
| Format | Speed | Memory | Notes |
|--------|-------|--------|-------|
| **PDF (text)** | 10-100 MB/s | ~50MB per doc | Fastest extraction |
| **Office docs** | 20-200 MB/s | ~100MB per doc | DOCX, XLSX, PPTX |
| **Images (OCR)** | 1-5 MB/s | Variable | Depends on OCR backend |
| **Archives** | 5-50 MB/s | ~200MB per doc | ZIP, TAR, etc. |
| **Web formats** | 50-200 MB/s | Streaming | HTML, XML, JSON |
## OCR Support
Kreuzberg supports multiple OCR backends for extracting text from scanned documents and images:
- **Tesseract**
- **Paddleocr**
### OCR Configuration Example
```typescript title="TypeScript"
import { extractFile } from "@kreuzberg/node";
const config = {
ocr: {
backend: "tesseract",
language: "eng+fra",
tesseractConfig: {
psm: 3,
},
},
};
const result = await extractFile("document.pdf", null, config);
console.log(result.content);
```
## Async Support
This binding provides full async/await support for non-blocking document processing:
```typescript title="TypeScript"
import { extractFile } from "@kreuzberg/node";
const result = await extractFile("document.pdf");
console.log(result.content);
```
## Plugin System
Kreuzberg supports extensible post-processing plugins for custom text transformation and filtering.
For detailed plugin documentation, visit [Plugin System Guide](https://docs.kreuzberg.dev/guides/plugins/).
## Embeddings Support
Generate vector embeddings for extracted text using the built-in ONNX Runtime support. Requires ONNX Runtime installation.
**[Embeddings Guide](https://docs.kreuzberg.dev/features/#embeddings)**
## Batch Processing
Process multiple documents efficiently:
```typescript title="TypeScript"
import { batchExtractFilesSync } from "@kreuzberg/node";
const files = ["doc1.pdf", "doc2.docx", "doc3.pptx"];
const results = batchExtractFilesSync(files);
results.forEach((result, i) => {
console.log(`File ${i + 1}: ${result.content.length} characters`);
});
```
## Configuration
For advanced configuration options including language detection, table extraction, OCR settings, and more:
**[Configuration Guide](https://docs.kreuzberg.dev/guides/configuration/)**
## Documentation
- **[Official Documentation](https://docs.kreuzberg.dev/)**
- **[API Reference](https://docs.kreuzberg.dev/reference/api-python/)**
- **[Examples & Guides](https://docs.kreuzberg.dev/)**
## Contributing
Contributions are welcome! See [Contributing Guide](https://github.com/kreuzberg-dev/kreuzberg/blob/main/CONTRIBUTING.md).
## Part of Kreuzberg.dev
- [Kreuzberg Cloud](https://github.com/kreuzberg-dev/kreuzberg-cloud) — managed extraction API with SDKs, dashboards, and observability.
- [kreuzcrawl](https://github.com/kreuzberg-dev/kreuzcrawl) — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
- [html-to-markdown](https://github.com/kreuzberg-dev/html-to-markdown) — fast, lossless HTML→Markdown engine.
- [liter-llm](https://github.com/kreuzberg-dev/liter-llm) — universal LLM API client with native bindings for 14 languages and 143 providers.
- [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — tree-sitter grammars and code-intelligence primitives.
- [alef](https://github.com/kreuzberg-dev/alef) — the polyglot binding generator that produces this README and all per-language bindings.
- [Discord](https://discord.gg/xt9WY3GnKR) — community, roadmap, announcements.
## License
Elastic-2.0 License — see [LICENSE](../../LICENSE) for details.
## Support
- **Discord Community**: [Join our Discord](https://discord.gg/xt9WY3GnKR)
- **GitHub Issues**: [Report bugs](https://github.com/kreuzberg-dev/kreuzberg/issues)
- **Discussions**: [Ask questions](https://github.com/kreuzberg-dev/kreuzberg/discussions)

View File

@@ -0,0 +1,27 @@
// Wrap JsFormatMetadata to add getters for format-specific metadata
// This works around the limitation that #[napi(getter)] doesn't work on #[napi(object)]
export function wrapFormatMetadata(fmt) {
if (!fmt || typeof fmt !== "object") return fmt;
const tag = fmt.format_type;
const payload = fmt["0"];
if (!payload) return fmt;
try {
const data = JSON.parse(payload);
// Add the typed variant property as a non-enumerable property
Object.defineProperty(fmt, tag, {
value: data,
enumerable: false,
writable: false,
configurable: false,
});
} catch (e) {
// Ignore JSON parse errors
}
return fmt;
}

5488
crates/kreuzberg-node/index.d.ts generated vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,15 @@
{
"name": "@kreuzberg/node-darwin-arm64",
"version": "5.0.0-rc.3",
"license": "Elastic-2.0",
"repository": {
"type": "git",
"url": "git+https://github.com/kreuzberg-dev/kreuzberg.git"
},
"main": "kreuzberg-node.darwin-arm64.node",
"files": ["kreuzberg-node.darwin-arm64.node"],
"os": ["darwin"],
"cpu": ["arm64"],
"engines": { "node": ">= 18" },
"publishConfig": { "access": "public" }
}

View File

@@ -0,0 +1,15 @@
{
"name": "@kreuzberg/node-darwin-x64",
"version": "5.0.0-rc.3",
"license": "Elastic-2.0",
"repository": {
"type": "git",
"url": "git+https://github.com/kreuzberg-dev/kreuzberg.git"
},
"main": "kreuzberg-node.darwin-x64.node",
"files": ["kreuzberg-node.darwin-x64.node"],
"os": ["darwin"],
"cpu": ["x64"],
"engines": { "node": ">= 18" },
"publishConfig": { "access": "public" }
}

View File

@@ -0,0 +1,16 @@
{
"name": "@kreuzberg/node-linux-arm64-gnu",
"version": "5.0.0-rc.3",
"license": "Elastic-2.0",
"repository": {
"type": "git",
"url": "git+https://github.com/kreuzberg-dev/kreuzberg.git"
},
"main": "kreuzberg-node.linux-arm64-gnu.node",
"files": ["kreuzberg-node.linux-arm64-gnu.node"],
"os": ["linux"],
"cpu": ["arm64"],
"libc": ["glibc"],
"engines": { "node": ">= 18" },
"publishConfig": { "access": "public" }
}

View File

@@ -0,0 +1,16 @@
{
"name": "@kreuzberg/node-linux-arm64-musl",
"version": "5.0.0-rc.3",
"license": "Elastic-2.0",
"repository": {
"type": "git",
"url": "git+https://github.com/kreuzberg-dev/kreuzberg.git"
},
"main": "kreuzberg-node.linux-arm64-musl.node",
"files": ["kreuzberg-node.linux-arm64-musl.node"],
"os": ["linux"],
"cpu": ["arm64"],
"libc": ["musl"],
"engines": { "node": ">= 18" },
"publishConfig": { "access": "public" }
}

View File

@@ -0,0 +1,16 @@
{
"name": "@kreuzberg/node-linux-x64-gnu",
"version": "5.0.0-rc.3",
"license": "Elastic-2.0",
"repository": {
"type": "git",
"url": "git+https://github.com/kreuzberg-dev/kreuzberg.git"
},
"main": "kreuzberg-node.linux-x64-gnu.node",
"files": ["kreuzberg-node.linux-x64-gnu.node"],
"os": ["linux"],
"cpu": ["x64"],
"libc": ["glibc"],
"engines": { "node": ">= 18" },
"publishConfig": { "access": "public" }
}

View File

@@ -0,0 +1,16 @@
{
"name": "@kreuzberg/node-linux-x64-musl",
"version": "5.0.0-rc.3",
"license": "Elastic-2.0",
"repository": {
"type": "git",
"url": "git+https://github.com/kreuzberg-dev/kreuzberg.git"
},
"main": "kreuzberg-node.linux-x64-musl.node",
"files": ["kreuzberg-node.linux-x64-musl.node"],
"os": ["linux"],
"cpu": ["x64"],
"libc": ["musl"],
"engines": { "node": ">= 18" },
"publishConfig": { "access": "public" }
}

View File

@@ -0,0 +1,15 @@
{
"name": "@kreuzberg/node-win32-arm64-msvc",
"version": "5.0.0-rc.3",
"license": "Elastic-2.0",
"repository": {
"type": "git",
"url": "git+https://github.com/kreuzberg-dev/kreuzberg.git"
},
"main": "kreuzberg-node.win32-arm64-msvc.node",
"files": ["kreuzberg-node.win32-arm64-msvc.node"],
"os": ["win32"],
"cpu": ["arm64"],
"engines": { "node": ">= 18" },
"publishConfig": { "access": "public" }
}

View File

@@ -0,0 +1,15 @@
{
"name": "@kreuzberg/node-win32-x64-msvc",
"version": "5.0.0-rc.3",
"license": "Elastic-2.0",
"repository": {
"type": "git",
"url": "git+https://github.com/kreuzberg-dev/kreuzberg.git"
},
"main": "kreuzberg-node.win32-x64-msvc.node",
"files": ["kreuzberg-node.win32-x64-msvc.node"],
"os": ["win32"],
"cpu": ["x64"],
"engines": { "node": ">= 18" },
"publishConfig": { "access": "public" }
}

52
crates/kreuzberg-node/package.json generated Normal file
View File

@@ -0,0 +1,52 @@
{
"name": "@kreuzberg/node",
"version": "5.0.0-rc.3",
"description": "High-performance document intelligence library",
"license": "Elastic-2.0",
"repository": {
"type": "git",
"url": "git+https://github.com/kreuzberg-dev/kreuzberg.git"
},
"main": "index.js",
"types": "index.d.ts",
"exports": {
".": {
"types": "./index.d.ts",
"require": "./index.js",
"default": "./index.js"
}
},
"files": ["index.js", "index.d.ts", "*.node"],
"optionalDependencies": {
"@kreuzberg/node-linux-x64-gnu": "5.0.0-rc.3",
"@kreuzberg/node-linux-arm64-gnu": "5.0.0-rc.3",
"@kreuzberg/node-linux-x64-musl": "5.0.0-rc.3",
"@kreuzberg/node-linux-arm64-musl": "5.0.0-rc.3",
"@kreuzberg/node-darwin-x64": "5.0.0-rc.3",
"@kreuzberg/node-darwin-arm64": "5.0.0-rc.3",
"@kreuzberg/node-win32-x64-msvc": "5.0.0-rc.3",
"@kreuzberg/node-win32-arm64-msvc": "5.0.0-rc.3"
},
"napi": {
"packageName": "@kreuzberg/node",
"binaryName": "kreuzberg-node",
"targets": [
"x86_64-unknown-linux-gnu",
"aarch64-unknown-linux-gnu",
"x86_64-unknown-linux-musl",
"aarch64-unknown-linux-musl",
"x86_64-apple-darwin",
"aarch64-apple-darwin",
"x86_64-pc-windows-msvc",
"aarch64-pc-windows-msvc"
]
},
"scripts": {
"build": "napi build --platform --release",
"artifacts": "napi artifacts",
"prepublishOnly": "napi prepublish -t npm --skip-optional-publish"
},
"engines": { "node": ">= 18" },
"publishConfig": { "access": "public" },
"devDependencies": { "@napi-rs/cli": "^3.6.2" }
}

15166
crates/kreuzberg-node/src/lib.rs generated Normal file

File diff suppressed because it is too large Load Diff