github kreuzberg-dev/kreuzberg v4.4.6

9 hours ago

Added

  • dBASE (.dbf) format support: Extract table data from dBASE files as markdown tables with field type support.
  • Hangul Word Processor (.hwp/.hwpx) support: Extract text content from HWP 5.0 documents (standard Korean document format).
  • Office template/macro format variants: Added support for .docm, .dotx, .dotm, .dot (Word), .potx, .potm, .pot (PowerPoint), .xltx, .xlt (Excel) formats.

Fixed

  • DOCX image placeholders missing (#484): Extracting .docx files with extract_images=True no longer produced ![](image) placeholders in the output. The default plain text output path was stripping image references. Image extraction now forces markdown output so placeholders are always included.

Changed

  • Format count updated to 88+: Documentation across all READMEs, docs, and package manifests updated to reflect expanded format support (previously 75+).

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.