9.4 KiB
HTML Output
!!! Info "Added in v4.8.1"
Render extracted document content as styled HTML with semantic kb-* CSS classes, configurable themes, and full CSS customization.
Quick Start
=== "CLI"
```bash title="Terminal"
kreuzberg extract doc.pdf --html-theme github
```
=== "Python"
```python title="html_output.py"
from kreuzberg import ExtractionConfig, HtmlOutputConfig, HtmlTheme, extract_file
config = ExtractionConfig(
output_format="html",
html_output=HtmlOutputConfig(theme=HtmlTheme.GitHub),
)
result = await extract_file("doc.pdf", config=config)
print(result.content) # styled HTML string
```
=== "TypeScript"
```typescript title="html_output.ts"
import { extractFile, HtmlTheme } from '@kreuzberg/node';
const result = await extractFile('doc.pdf', {
outputFormat: 'html',
htmlOutput: { theme: HtmlTheme.GitHub },
});
console.log(result.content);
```
=== "Rust"
```rust title="html_output.rs"
use kreuzberg::{extract_file, ExtractionConfig, HtmlOutputConfig, HtmlTheme};
let config = ExtractionConfig {
output_format: "html".to_string(),
html_output: Some(HtmlOutputConfig {
theme: HtmlTheme::GitHub,
..Default::default()
}),
..Default::default()
};
let result = extract_file("doc.pdf", None, &config).await?;
println!("{}", result.content);
```
Built-in Themes
| Theme | Description |
|---|---|
unstyled (default) |
No built-in CSS. Only structural markup with kb-* classes. Use your own style sheet. |
default |
System font stack, neutral colours, 72ch max width. All CSS custom properties defined. |
github |
GitHub Markdown-inspired palette, border-bottom headings, 80ch max width. |
dark |
Dark background (#0d1117), light text. Good for terminal/IDE integrations. |
light |
Minimal light theme with generous spacing. |
Configuration
See HtmlOutputConfig for detailed field documentation.
=== "Python"
```python title="html_config.py"
from kreuzberg import ExtractionConfig, HtmlOutputConfig, HtmlTheme
config = ExtractionConfig(
output_format="html",
html_output=HtmlOutputConfig(
theme=HtmlTheme.Dark,
css="body { padding: 2rem; }",
class_prefix="kb-",
embed_css=True,
),
)
```
=== "TypeScript"
```typescript title="html_config.ts"
import { HtmlTheme } from '@kreuzberg/node';
const config = {
outputFormat: 'html',
htmlOutput: {
theme: HtmlTheme.Dark,
css: 'body { padding: 2rem; }',
classPrefix: 'kb-',
embedCss: true,
},
};
```
=== "Rust"
```rust title="html_config.rs"
use kreuzberg::{ExtractionConfig, HtmlOutputConfig, HtmlTheme};
let config = ExtractionConfig {
output_format: "html".to_string(),
html_output: Some(HtmlOutputConfig {
theme: HtmlTheme::Dark,
css: Some("body { padding: 2rem; }".to_string()),
class_prefix: "kb-".to_string(),
embed_css: true,
..Default::default()
}),
..Default::default()
};
```
CLI Flags
| Flag | Description |
|---|---|
--html-theme <THEME> |
Built-in theme: default, github, dark, light, unstyled. Implies --content-format html. |
--html-css <CSS> |
Inline CSS string appended after the theme stylesheet. |
--html-css-file <PATH> |
Path to CSS file loaded at render time (max 1 MiB). |
--html-class-prefix <PREFIX> |
CSS class prefix; default: "kb-". Alphanumeric, hyphens, underscores only. |
--html-no-embed-css |
Suppress the <style> block; use external stylesheet instead. |
CSS Customization
All built-in themes (except unstyled) define CSS custom properties on :root. Override them to adjust the theme without replacing it entirely:
:root {
--kb-font-family: "Inter", sans-serif;
--kb-text-color: #333;
--kb-max-width: 60ch;
}
Pass custom CSS inline or from a file:
=== "CLI"
```bash title="Terminal"
# Inline override
kreuzberg extract doc.pdf --html-theme github \
--html-css ':root { --kb-max-width: 60ch; }'
# From a file
kreuzberg extract doc.pdf --html-theme github \
--html-css-file custom.css
```
=== "Python"
```python title="custom_css.py"
from kreuzberg import ExtractionConfig, HtmlOutputConfig, HtmlTheme
config = ExtractionConfig(
output_format="html",
html_output=HtmlOutputConfig(
theme=HtmlTheme.GitHub,
css_file="custom.css",
),
)
```
=== "Rust"
```rust title="custom_css.rs"
use kreuzberg::{ExtractionConfig, HtmlOutputConfig, HtmlTheme};
use std::path::PathBuf;
let config = ExtractionConfig {
output_format: "html".to_string(),
html_output: Some(HtmlOutputConfig {
theme: HtmlTheme::GitHub,
css_file: Some(PathBuf::from("custom.css")),
..Default::default()
}),
..Default::default()
};
```
To use your own style sheet, set the theme to unstyled and disable the embedded <style> block:
config = ExtractionConfig(
output_format="html",
html_output=HtmlOutputConfig(
theme=HtmlTheme.Unstyled,
embed_css=False,
),
)
Class Reference
All generated HTML elements include semantic kb-* classes for targeted styling.
| Class | Element | Description |
|---|---|---|
kb-doc |
<div> |
Root wrapper |
kb-content |
<main> |
Content area |
kb-doc-title |
<h1> |
Document title |
kb-h, kb-h1..kb-h6 |
<h1>..<h6> |
Headings |
kb-p |
<p> |
Paragraphs |
kb-list, kb-ul, kb-ol |
<ul>, <ol> |
Lists |
kb-li |
<li> |
List items |
kb-blockquote |
<blockquote> |
Block quotes |
kb-pre |
<pre> |
Code blocks |
kb-code |
<code> |
Inline/block code |
kb-table |
<table> |
Tables |
kb-thead, kb-tbody |
<thead>, <tbody> |
Table sections |
kb-th, kb-td, kb-tr |
<th>, <td>, <tr> |
Table cells/rows |
kb-figure |
<figure> |
Image wrapper |
kb-img |
<img> |
Images |
kb-page-break |
<hr> |
Page breaks |
kb-footnote |
<aside> |
Footnote definitions |
kb-footnote-ref |
<sup> |
Footnote references |
kb-citation |
<cite> |
Citations |
kb-link |
<a> |
Hyperlinks |
kb-metadata |
<dl> |
Metadata blocks |
kb-formula |
<pre> |
Math formulas |
kb-slide |
<section> |
Slide sections |
kb-dt, kb-dd |
<dt>, <dd> |
Definition terms/descriptions |
kb-admonition |
<aside> |
Admonitions |
kb-group |
<div> |
Grouped content |
!!! Tip "Custom prefix" If you set class_prefix to "my-", all classes become my-doc, my-content, my-h1, and so on.
Security
!!! Warning "Security considerations" - class_prefix is validated to prevent HTML injection - </style> sequences are stripped from user CSS - css_file is limited to 1 MiB - When serving HTML to untrusted users, sanitize CSS at the application layer
See Also
- Configuration -- all configuration options
- Extraction Basics -- core extraction API and supported formats
- Element-Based Output -- structured element output as an alternative to HTML
- Document Structure -- how Kreuzberg models document structure