github kreuzberg-dev/kreuzberg v4.4.2
Release v4.4.2

5 hours ago

Fixed

  • E2E element type assertions: Fixed element type field name in E2E generator templates for Python, TypeScript, WASM Deno, Elixir, Ruby, PHP, and C#
  • Ruby PDF annotation extraction: Fixed PdfAnnotation and PdfAnnotationBoundingBox autoload and bounding box field name mismatch
  • WASM OCR blocking event loop: OCR now runs in a worker thread, keeping the main thread responsive
  • JPEG 2000 OCR decode failure: Shared load_image_for_ocr() helper with hayro-jpeg2000/hayro-jbig2 decoders across all OCR backends
  • WASM PDF empty content: PDFium initialization now properly awaited during initWasm()

Added

  • OMML-to-LaTeX math conversion for DOCX: Mathematical equations converted to LaTeX notation
  • Plain text output paths for all extractors: DOCX, PPTX, ODT, FB2, DocBook, RTF, Jupyter produce clean plain text when requested
  • cells_to_text() shared utility: Tab-separated plain text table formatter

Changed

  • CLI includes all features: kreuzberg-cli now uses full feature set including archives

See CHANGELOG.md for full details.

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.