What's New
SLANeXT Table Structure Recognition
Alternative table structure backends alongside TATR. New table_model field on LayoutDetectionConfig selects the backend:
| Model | Config Value | Size | Best For |
|---|---|---|---|
| TATR | "tatr" (default)
| 30 MB | General-purpose, consistent results |
| SLANeXT Wired | "slanet_wired"
| 365 MB | Bordered/gridlined tables |
| SLANeXT Wireless | "slanet_wireless"
| 365 MB | Borderless tables |
| SLANeXT Auto | "slanet_auto"
| ~737 MB | Mixed documents (auto-classifies) |
| SLANet-plus | "slanet_plus"
| 7.78 MB | Resource-constrained environments |
Available across all 12 language bindings and CLI (--layout-table-model).
Apple iWork Format Support
Native parsing for .pages, .numbers, and .key files (2013+ format) via protobuf text extraction from Snappy-compressed IWA containers.
Other Changes
- PP-LCNet table classifier for automatic wired/wireless table detection
- CLI
cache warm --all-table-modelsfor opt-in SLANeXT download (~730MB) - ISO 21111-10 benchmark fixture with MinerU ground truth
- Format count updated to 91+
See CHANGELOG.md for full details.