Fixed
- #790: Fix GPU acceleration — kreuzberg now bundles CPU-only ONNX Runtime by default (zero-config). When a GPU execution provider (
cuda,tensorrt,coreml) is explicitly requested viaAccelerationConfigbut unavailable, kreuzberg returns an error with setup instructions instead of silently falling back to CPU.Automode gracefully falls back to CPU with an info log. For GPU support, setORT_DYLIB_PATHto a GPU-enabled ONNX Runtime. - #791: Fix DOCX OCR extraction — OCR now runs on embedded images before document rendering, and OCR text is injected into the rendered output. Previously, OCR results were discarded and replaced with placeholder text.
- #783: PaddleOCR backend not utilizing GPU (CUDA) despite
AccelerationConfig—AccelerationConfigfromExtractionConfigwas never reaching PaddleOCR ONNX sessions, silently falling back to CPU. Acceleration is now propagated throughOcrConfigto all OCR call sites (image extractor, PDF OCR). - #779: Expose
PaddleOcrConfigin Python bindings and updateOcrConfigfor backward compatibility. - #792: Fix Ruby gem packaging — exclude staged
libpdfium.dylibfrom gem artifacts by narrowing the native extension glob to only include the compiledkreuzberg_rb.*extension.
Added
- GPU CI workflow (
ci-gpu.yaml) targeting self-hosted GPU runners with NVIDIA GPUs. - Comprehensive GPU integration tests covering all ORT-accelerated paths: PaddleOCR (det/cls/rec), layout detection (RT-DETR), embeddings, document orientation detection, and end-to-end extraction.