github kreuzberg-dev/kreuzberg v4.8.2

6 hours ago

Added

  • HtmlOutputConfig typed in all bindingshtml_output config field (themes, CSS classes, embed CSS, custom CSS, class prefix) now fully typed in Python, TypeScript/Node, Go, Ruby, Elixir, PHP, Java, C#, R, and FFI. Previously only available in Rust core.

Fixed

  • PDF: legitimate repeated content stripped during page merging regardless of strip_repeating_text flagdeduplicate_paragraphs() runs unconditionally, stripping brand names and other legitimately repeated content even when ContentFilterConfig.strip_repeating_text is false. Gated both deduplication passes behind the flag (#670, #681)
  • R package build failure — R binding Cargo.toml version was stuck at 4.6.3 while core was at 4.8.1, causing tokio version resolution failure. Version sync script now includes the R native extension Cargo.toml.
  • CI: PyPI publish action failure — pinned pypa/gh-action-pypi-publish to v1.13.0 (v1.14.0 has broken Docker image on GHCR)
  • E2E: Elixir generator emitted undefined is_nan/1 function — added helper function definition to the generated Elixir test helpers

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.