github kreuzberg-dev/kreuzberg v4.6.2

6 hours ago

[4.6.2] - 2026-03-26

Added

  • PDF page rendering API (#583): New render_pdf_page function and PdfPageIterator for rendering individual PDF pages as PNG images. Available across all 11 language bindings with idiomatic patterns (Python context manager, Go Close(), Java AutoCloseable, C# IDisposable, Elixir Stream, etc.). Default 150 DPI, configurable per call.

Fixed

  • Table recognition coordinate mismatch on scanned PDFs (#582): Layout detection bboxes (640x640 model space) are now scaled to OCR render resolution before TATR table recognition. Previously, coordinate space mismatch caused zero tables to be found.
  • OCR elements report page_number: 1 for all pages (#582): Tesseract resets page numbers per single-page render. Page numbers are now correctly stamped after OCR in the batch loop.
  • Rust E2E tests missing PDF feature: Added pdf feature to the e2e-generator Rust template, fixing 41 UnsupportedFormat("application/pdf") failures.
  • HWP styled extraction empty on ARM: Added skip_on_platform support to Python and Java e2e generators, skipping the hwp_styled fixture on aarch64-unknown-linux-gnu.
  • WASM CI build failure: Made kreuzberg-node prepare script resilient to missing native addon, preventing ENOENT: dist/cli.js during pnpm workspace install.
  • Go C header stale at 4.5.0: Synced header and DefaultVersion constant to match current version.
  • Ruby gem missing ONNX Runtime: Added ort-bundled feature to Ruby native Cargo.toml.
  • Elixir doctest failures: Updated ExtractionConfig.to_map/1 doctests for force_ocr_pages field.
  • WASM benchmark timeout: Reduced per-extraction timeout from 600s to 120s and job timeout from 6h to 2h.

Improved

  • version:sync now syncs Go C header, DefaultVersion, and Docker compose tags: Prevents version drift across language bindings.
  • Publish pipeline commits Elixir NIF checksums back to main: Prevents stale checksums after releases.
  • WASM test app migrated to Deno: Replaced Node.js/vitest with Deno test runner, fixing fetch() unavailability.
  • Docs migrated from MkDocs to Zensical: 4-5x faster incremental builds.

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.