What's Changed
Added
- WASM Office Format Support: DOCX, PPTX, RTF, reStructuredText, Org-mode, FictionBook, Typst, BibTeX, and Markdown extraction now available in the browser/WASM build
- Citation extraction for RIS, PubMed/MEDLINE, and EndNote XML formats
- JPEG 2000 and JBIG2 image decoding for OCR (pure Rust)
- Gzip archive extraction with bomb protection
- WASM integration tests and e2e fixtures for all new office formats
- Security limits for archive extraction pipeline
Fixed
- WASM Build: Fixed
zstd-sysandtokio/miocompilation failures forwasm32-unknown-unknownwhen office feature enabled - MIME type detection: case-insensitive validation, missing types synced with extractor registry
- YAML, Typst, Djot file format recognition
- Benchmark harness fixes for Go, Java, PHP, Elixir, WASM wrappers
Full Changelog: https://github.com/kreuzberg-dev/kreuzberg/blob/main/CHANGELOG.md#4213---2026-02-07