Added
HTML Configuration Support
- Full
html_optionsconfiguration: Thehtml_optionsfield inExtractionConfigis now fully configurable from config files (TOML/YAML/JSON) and all language bindings (#282)- Upgraded
html-to-markdown-rsto v2.21.1 with serde support - Configure heading styles, code block styles, list formatting, text wrapping, and more
- Replaces v3's
HTMLToMarkdownConfigwith more comprehensive options - See migration guide for available options and examples
- Upgraded
Fixed
Go Module
- Fixed header include path for external users:
plugins_test_helpers.gonow uses the bundled header atinternal/ffi/kreuzberg.hinstead of a relative path to the monorepo (#280)
C# SDK
- Keyword extraction deserialization: Fixed
JsonExceptionwhen using keyword extraction - keywords are now properly deserialized asExtractedKeywordobjects (#285)
Documentation
- Rust OCR code examples: Fixed incorrect
Some(...)wrapper in OcrConfig examples (#284)
Tests
- Flaky concurrent interning test: Marked as
#[ignore]to prevent intermittent CI failures
Distribution
- Homebrew tap visibility: Made
kreuzberg-dev/homebrew-taprepository public (#283)
Full Changelog: v4.0.2...v4.0.3