github kreuzberg-dev/kreuzberg v4.2.9

6 hours ago

Fixes

MCP Server (#348)

  • Fixed nested runtime panic in Docker/MCP context: Resolved "Cannot start a runtime from within a runtime" panic when using extract_file tool via MCP server in Docker. The MCP extraction tools were calling sync wrappers which use GLOBAL_RUNTIME.block_on() from within the already-running Tokio runtime. Now always uses async extraction in MCP context.
  • Removed unused async parameter from MCP tools: The async parameter on extract_file, extract_bytes, and batch_extract_files MCP tools has been removed since MCP always runs in an async context.

Python Bindings (#349)

  • Fixed Windows CLI binary not found: Fixed "embedded binary not found" error on Windows. The build script now correctly handles Windows .exe extension when copying the CLI binary into the wheel.

OCR Heuristic

  • Pass actual page count to OCR fallback evaluator: evaluate_native_text_for_ocr was called with None for page count, defaulting to 1. This inflated per-page averages for multi-page documents, causing scanned PDFs to skip OCR.
  • Per-page OCR evaluation for mixed-content PDFs: Added evaluate_per_page_ocr which evaluates each page independently using page boundaries. If any single page triggers OCR fallback, the entire document is OCR'd.

Full Changelog

v4.2.8...v4.2.9

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.