Fixed
- Archive extraction SIGBUS crash on macOS ARM64 — ZIP, 7Z, TAR, and GZIP archive extraction crashed with SIGBUS (signal 10) in release builds due to miscompilation of unsafe code in
sevenz-rust2andzipcrates underopt-level=3. Reduced optimization level to 2 for these crates. This also fixes Elixir, R, Go, and C benchmark crashes when processing archive files. - Native-text PDF extraction fails when OCR backend unavailable (#646) — PDFs with extractable native text hard-failed with
ParsingError: All OCR pipeline backends failedwhen no OCR backend (PaddleOCR/Tesseract) was installed, even though pdfium already extracted text successfully. The automatic OCR quality-enhancement pass now gracefully falls back to the native extraction result when OCR backends are unavailable, emitting a warning instead of failing. - Elixir Logger pollutes stdout — Elixir benchmark scripts produced
[debug] Initialized Kreuzberg.Plugin.Registryon stdout, corrupting JSON output. Logger default handler now configured to write to stderr viaconfig :logger, :default_handler. - WASM benchmark module resolution — WASM benchmark script failed to load
@kreuzberg/wasmthrough pnpm virtual store due toimport.meta.urlresolution issues in tsx. Changed to direct import from local build path. - CI: FFI-dependent tests fail when FFI build skipped — Go, Elixir, R, C FFI, and CLI test jobs ran and failed when
build-ffiwas skipped by paths-filter. Addedneeds.build-ffi.result == 'success'guard. - Rust cannot catch foreign exceptions crash (#606) — C++ exceptions from Tesseract or Leptonica (e.g. on corrupted images or edge-case inputs) propagated across the FFI boundary unhandled, causing
fatal runtime error: Rust cannot catch foreign exceptions, aborting. All Tesseract/Leptonica FFI declarations now useextern "C-unwind"to allow foreign exceptions to unwind safely, and OCR processing is wrapped withcatch_unwindto convert them to recoverable errors.