github kreuzberg-dev/kreuzberg v4.0.0-rc.14

latest releases: v4.0.0-rc.15, packages/go/v4/v4.0.0-rc.15, packages/go/v4/v4.0.0-rc.14...
pre-release15 hours ago

Added

  • Comprehensive test suites for all language bindings:
    • Python: 34 tests covering type verification, batch APIs, byte extraction, MIME detection, OCR, all file types
    • Node.js: 79 tests for type verification, batch APIs, MIME detection, configuration, error handling
    • Ruby: 55 RSpec tests for batch APIs, byte extraction, type verification, configuration handling
    • Java: 85 JUnit 5 tests with FFM API memory management, concurrency, all file format support
    • WASM: 79 tests with performance validation, large document handling, concurrent operations
    • Go: 63 table-driven tests with context support, error wrapping with errors.Is(), all file types
    • All test suites verify batch extraction APIs (sync/async), type safety, result structure validation

Fixed

  • NuGet publish workflow reliability: Replaced NuGet/login OIDC-based authentication with direct API key approach
    • Issue: NuGet/login action could fail with 401 errors due to OIDC token context limitations (see NuGet/login#6)
    • Solution: Removed NuGet/login step and pass API key directly via NUGET_AUTH_TOKEN environment variable and --api-key parameter
    • Impact: More reliable C# package publishing without dependency on OIDC token exchange
    • Requires: NUGET_API_KEY secret to be configured in GitHub repository settings
  • LibreOffice installation in Docker full image: Updated LibreOffice from 25.8.2 to 25.8.4
    • Version 25.8.2 download URLs were no longer available on DocumentFoundation servers
    • Updated to latest stable release 25.8.4 (released Dec 18, 2025)
    • Verified working for Office document extraction (DOCX, XLSX, ODT)
    • Tested on both x86_64 and aarch64 architectures
  • Python IDE type completions: Fixed missing type hints in IDE autocomplete
    • Root cause: _internal_bindings.pyi stub file was not being included in wheel distribution
    • Solution: Added ensure_stub_file() function in packages/python/build.py to verify and include stub file in all build outputs
    • Impact: Full autocomplete now works for all 67 public APIs, type checkers can find definitions, mypy strict mode compatible
  • Ruby gem native extension compilation: Fixed vendoring of Rust crates during build
    • Added automatic vendoring task to packages/ruby/Rakefile that runs before compilation
    • Ensures vendor/kreuzberg, vendor/kreuzberg-ffi, and vendor/kreuzberg-tesseract are properly copied and version-updated before building native extension
  • Python ExtractionResult.pages type hints: Fixed missing type definition in PyO3 stub file
    • Root cause: _internal_bindings.pyi was missing pages field declaration in ExtractionResult class
    • Added pages: list[PageContent] | None attribute and PageContent TypedDict definition
    • Impact: IDEs now properly show autocomplete for result.pages, type checkers recognize the attribute
    • Fixes TypeError: 'NoneType' object is not iterable confusion when users iterate without checking for None

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.