[4.6.2] - 2026-03-26
Added
- PDF page rendering API (#583): New
render_pdf_pagefunction andPdfPageIteratorfor rendering individual PDF pages as PNG images. Available across all 11 language bindings with idiomatic patterns (Python context manager, Go Close(), Java AutoCloseable, C# IDisposable, Elixir Stream, etc.). Default 150 DPI, configurable per call.
Fixed
- Table recognition coordinate mismatch on scanned PDFs (#582): Layout detection bboxes (640x640 model space) are now scaled to OCR render resolution before TATR table recognition. Previously, coordinate space mismatch caused zero tables to be found.
- OCR elements report
page_number: 1for all pages (#582): Tesseract resets page numbers per single-page render. Page numbers are now correctly stamped after OCR in the batch loop. - Rust E2E tests missing PDF feature: Added
pdffeature to the e2e-generator Rust template, fixing 41UnsupportedFormat("application/pdf")failures. - HWP styled extraction empty on ARM: Added
skip_on_platformsupport to Python and Java e2e generators, skipping thehwp_styledfixture onaarch64-unknown-linux-gnu. - WASM CI build failure: Made
kreuzberg-nodeprepare script resilient to missing native addon, preventingENOENT: dist/cli.jsduring pnpm workspace install. - Go C header stale at 4.5.0: Synced header and
DefaultVersionconstant to match current version. - Ruby gem missing ONNX Runtime: Added
ort-bundledfeature to Ruby native Cargo.toml. - Elixir doctest failures: Updated
ExtractionConfig.to_map/1doctests forforce_ocr_pagesfield. - WASM benchmark timeout: Reduced per-extraction timeout from 600s to 120s and job timeout from 6h to 2h.
Improved
version:syncnow syncs Go C header, DefaultVersion, and Docker compose tags: Prevents version drift across language bindings.- Publish pipeline commits Elixir NIF checksums back to main: Prevents stale checksums after releases.
- WASM test app migrated to Deno: Replaced Node.js/vitest with Deno test runner, fixing
fetch()unavailability. - Docs migrated from MkDocs to Zensical: 4-5x faster incremental builds.