github kreuzberg-dev/kreuzberg v4.2.12

4 hours ago

Fixed

DOCX Extraction

  • DOCX list items missing whitespace between text runs: Fixed separate text runs (<w:r> elements) within the same paragraph being concatenated without spaces, causing words to merge together. Root cause: the XML parser used trim_text(true) which stripped whitespace from <w:t> text elements. Vendored the docx-lite parser into kreuzberg and fixed the parser to use trim_text(false), preserving the original whitespace from the DOCX XML. (#359)

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.