Fixes
MCP Server (#348)
- Fixed nested runtime panic in Docker/MCP context: Resolved "Cannot start a runtime from within a runtime" panic when using
extract_filetool via MCP server in Docker. The MCP extraction tools were calling sync wrappers which useGLOBAL_RUNTIME.block_on()from within the already-running Tokio runtime. Now always uses async extraction in MCP context. - Removed unused
asyncparameter from MCP tools: Theasyncparameter onextract_file,extract_bytes, andbatch_extract_filesMCP tools has been removed since MCP always runs in an async context.
Python Bindings (#349)
- Fixed Windows CLI binary not found: Fixed "embedded binary not found" error on Windows. The build script now correctly handles Windows
.exeextension when copying the CLI binary into the wheel.
OCR Heuristic
- Pass actual page count to OCR fallback evaluator:
evaluate_native_text_for_ocrwas called withNonefor page count, defaulting to 1. This inflated per-page averages for multi-page documents, causing scanned PDFs to skip OCR. - Per-page OCR evaluation for mixed-content PDFs: Added
evaluate_per_page_ocrwhich evaluates each page independently using page boundaries. If any single page triggers OCR fallback, the entire document is OCR'd.