github kreuzberg-dev/kreuzberg v4.3.5

5 hours ago

What's New

Bounding Box Support

  • bounding_box on Table and ExtractedImage: Spatial position data (BoundingBox with x0, y0, x1, y1) now available on both types across all 10 language bindings (Rust, Python, TypeScript, Ruby, PHP, Go, Java, C#, Elixir, WASM).
  • Table bounding boxes computed from PDF character positions: During PDF extraction, table bounding boxes are calculated from constituent character positions for precise spatial layout.

Inline Markdown Embedding

  • Tables embedded inline in PDF markdown output: Tables appear at their correct vertical position instead of being appended at the end, with character deduplication to prevent text appearing both as paragraphs and inside tables.
  • Image placeholders in PDF markdown output: ![Image N (page P)](embedded:pP_iN) references injected with OCR text blockquotes when available.

Bug Fixes

  • PHP FFI bridge bounding_box passthrough: Fixed the Rust-PHP bridge to properly convert bounding boxes instead of always returning null.
  • Pipeline test flakiness: Fixed test_pipeline_without_chunking and related tests that failed due to global processor cache poisoning in parallel execution.

See CHANGELOG.md for full details.

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.