github kreuzberg-dev/kreuzberg v3.20.0

latest releases: v4.0.0-rc.10, packages/go/v4/v4.0.0-rc.10, v4.0.0-rc.9...
2 months ago

What's Changed

  • Migrate the HTML extractor and hOCR conversion to html-to-markdown 2.1 (Rust) bindings, removing the legacy BeautifulSoup-based path
  • Automatically capture inline data URI images and inline SVG assets when extract_images is enabled
  • Add Python 3.14 core support while documenting that EasyOCR, PaddleOCR, and spaCy-based entity extraction remain unavailable until upstream wheels support it

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.