github kreuzberg-dev/kreuzberg v4.7.2

latest releases: packages/go/v4.7.3, packages/go/v4/v4.7.2, packages/go/v4/v4.7.1...
one day ago

What's Changed

Added

  • E2E generator published mode — Generate standalone test apps against published registry versions for all 12 language bindings

Changed

  • Global model cache (#641) — Models now download to platform-appropriate global cache directory instead of per-directory .kreuzberg/ folders

Fixed

  • Leptonica DPI crash (#606) — Images with 0 DPI caused C++ exception abort during preprocessing. Now validates and fixes DPI to 72 before preprocessing. Also disabled C++ exception handling on Windows MSVC builds.
  • Embedded HTML in PDF text layers — PDFs with raw HTML in text layer produced escaped garbage. Now detected and converted to clean markdown.
  • Code classification false positives — Layout model sometimes classified regular prose as Code blocks. Added prose guard.
  • PageBreak rendering as separators — PageBreak elements rendered as ----- in output. Now treated as structural metadata.
  • Node.js ExtractionResult.children missing at runtime — Field was in TypeScript definitions but absent from runtime NAPI object.
  • Node.js disable_ocr config not respecteddisableOcr: true still produced OCR content for images.
  • C# Serialization class inaccessible — Class had insufficient access level in published NuGet package.
  • Java PdfAnnotation missing getters — Added getContent() and getPageNumber() methods.
  • Java Table missing getters — Added getCells(), getMarkdown(), and getPageNumber() methods.
  • PaddleOCR angle classification crash (#643) — Fixed input dimensions for V2 angle classifier model.
  • Centralized concurrency controls — Fixed 5 places bypassing resolve_thread_budget().
  • Chunk page numbers missing (#636) — Fixed first_page/last_page being null when chunking was configured.
  • Ruby OCR backend — Added missing ocr_internal_document field.

Full Changelog: v4.7.1...v4.7.2

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.