github kreuzberg-dev/kreuzberg v4.5.3
Release v4.5.3

7 hours ago

What's New

SLANeXT Table Structure Recognition

Alternative table structure backends alongside TATR. New table_model field on LayoutDetectionConfig selects the backend:

Model Config Value Size Best For
TATR "tatr" (default) 30 MB General-purpose, consistent results
SLANeXT Wired "slanet_wired" 365 MB Bordered/gridlined tables
SLANeXT Wireless "slanet_wireless" 365 MB Borderless tables
SLANeXT Auto "slanet_auto" ~737 MB Mixed documents (auto-classifies)
SLANet-plus "slanet_plus" 7.78 MB Resource-constrained environments

Available across all 12 language bindings and CLI (--layout-table-model).

Apple iWork Format Support

Native parsing for .pages, .numbers, and .key files (2013+ format) via protobuf text extraction from Snappy-compressed IWA containers.

Other Changes

  • PP-LCNet table classifier for automatic wired/wireless table detection
  • CLI cache warm --all-table-models for opt-in SLANeXT download (~730MB)
  • ISO 21111-10 benchmark fixture with MinerU ground truth
  • Format count updated to 91+

See CHANGELOG.md for full details.

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.