kreuzberg-dev/kreuzberg v3.13.3
on GitHub

latest releases: v4.0.0-rc.18, packages/go/v4/v4.0.0-rc.18, v4.0.0-rc.17...

3 months ago

🐛 Bug Fixes

Critical Regression Fixes

Fixed PDF extraction failures that caused "ExceptionGroup: unhandled errors in a TaskGroup" errors
Fixed XLS file extraction failures with "File not found 'xl/_rels/workbook.xml.rels'" errors
Fixed Tesseract OCR configuration to handle both enum and integer PSM values

Test Suite Improvements

Fixed DataFrame API compatibility - Converted tests to use Polars instead of Pandas for consistency
Fixed config file loading for arbitrary TOML files with [tool.kreuzberg] sections
Fixed API config caching with nested dictionaries

🔧 Technical Details

Root Cause Analysis

XLS Error: SpreadSheetExtractor was hardcoding .xlsx extension for all spreadsheet files, causing python-calamine to fail when parsing .xls files
Tesseract PSM Error: Code expected PSM as enum with .value attribute, but configuration provided integers

Changes Made

Added MIME type to file extension mapping in SpreadSheetExtractor
Updated Tesseract OCR to handle both enum and integer PSM values
Ensured consistent use of Polars DataFrames throughout codebase (except GMFT which uses Pandas internally)
Fixed configuration loading for non-standard TOML file names
Added hashable conversion for nested config dictionaries in API caching

📝 Testing

Added comprehensive regression tests using actual user data files
Added API tests for Docker container configuration patterns
All existing tests continue to pass

🔄 Compatibility

This release maintains full backwards compatibility while fixing critical regressions introduced after v3.12.

What's Changed

chore(deps): bump actions/setup-python from 5 to 6 by @dependabot[bot] in #124
fix: resolve regression in PDF extraction and XLS file handling by @Goldziher in #127

Full Changelog: v3.13.2...v3.13.3

Check out latest releases or
releases around kreuzberg-dev/kreuzberg v3.13.3

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.

Get notifications