Changed
API Alignment - 1:1 Parity Across All Bindings
- Strict API parity enforcement: All 9 language bindings now have exact 1:1 field parity with Rust core
- Verification script (
scripts/verify_api_parity.py) now runs in STRICT mode, failing on ANY field differences - Added to
ci-validate.yamlworkflow to prevent future API drift
- Verification script (
PHP Bindings
- ExtractionConfig alignment: Removed 5 non-canonical fields and fixed defaults
- Removed:
embedding,extractImages,extractTables,preserveFormatting,outputEncoding - Fixed defaults:
useCache→ true,enableQualityProcessing→ true,maxConcurrentExtractions→ null - Updated
ExtractionConfigBuilderto match canonical API - All 16 fields now match Rust canonical source exactly
- Removed:
Go Bindings
- ExtractionResult alignment: Removed
Successfield (not in Rust canonical) - PageInfo alignment: Removed
VisibleandContentTypefields (not in Rust canonical) - Updated 14 test files to remove references to removed fields
Ruby Bindings
- Default value fix: Changed
enable_quality_processingdefault fromfalsetotrueto match Rust
Java Bindings
- Default value fix: Changed
enableQualityProcessingdefault fromfalsetotrueto match Rust
TypeScript Bindings
- Type exports cleanup: Removed non-existent type exports (
EmbeddingConfig,EmbeddingModelType,HierarchyConfig,ImagePreprocessingConfig) from index.ts
Fixed
Elixir Bindings
- Hex package compilation: Fixed
force_build: truecausing production installs to fail (#333)- Changed to
force_build: Mix.env() in [:test, :dev]to only build from source in development - Production installs now correctly use precompiled NIF binaries from GitHub releases
- Changed to
Docker Images
- Tesseract OCR plugin initialization: Fixed "OCR backend 'tesseract' not registered" error in published Docker images
- Embeddings plugin initialization: Fixed "Failed to initialize embedding model" error
API
- JSON error responses: Added custom
JsonApiextractor for consistent JSON error responses - OpenAPI schema improvements: Enhanced schema validation constraints
- Chunking validation: Added validation that
overlapmust be less thanmax_characters - Embed validation: Added validation that all text entries must be non-empty strings
- Default embedding model:
EmbeddingConfig.modelnow defaults to "balanced" preset
CI/CD
- Schemathesis API contract testing: Added schemathesis to Docker CI workflow
Rust Core
- XLSX OOM with Excel Solver files: Fixed out-of-memory issue when processing XLSX files with sparse data at extreme cell positions (#331)