Release Highlights ๐
New Features
๐ฎ Mistral OCR Integration with Advanced PDF Processing
- Extended PDF processing support - Mistral OCR now joins Google Document AI in supporting all processing modes:
image
,pdf
, andwhole_pdf
- Cost-effective OCR - Purpose-built OCR endpoint optimized for document processing with competitive pricing
- Markdown-formatted output - Returns well-structured markdown text that preserves document formatting and layout
- Large document support - Handles files up to 50MB and 1,000 pages efficiently
- Set
OCR_PROVIDER: "mistral_ocr"
and configure your Mistral API key to get started
๐ท๏ธ Enhanced Title Generation with Context
- Original title context - Title generation now includes the existing document title as contextual information
- Improved relevance - Language models can use the original title to generate more accurate and contextually appropriate suggestions
- Better continuity - Maintains document naming consistency while enhancing title quality
- Smart fallbacks - Handles cases where original titles are missing or incomplete
Improvements & Refinements
๐ก๏ธ Configuration Validation
- OCR provider compatibility checks - Prevents invalid combinations of OCR providers and processing modes
- Clear error messages - Detailed feedback when unsupported mode combinations are detected
- Startup validation - Early detection of configuration issues before processing begins
- Provider-specific guidance - Helpful error messages explain which modes are supported by each provider
๐ Enhanced PDF Processing Architecture
- Hybrid file naming - Improved PDF splitting with standardized naming conventions that maintain backward compatibility
- More provider choice - Users can now choose between Google Document AI and Mistral OCR for advanced PDF processing
- Consistent behavior - Both advanced providers support
pdf
andwhole_pdf
modes with similar performance characteristics
๐งช Comprehensive E2E Testing
- Mistral OCR test suite - Full end-to-end testing of Mistral OCR integration with real PDF documents
- Processing mode validation - Tests verify
whole_pdf
mode works correctly with multi-page documents - Performance metrics - Test output includes detailed comparison of original vs. enhanced OCR content
- Cross-provider compatibility - Tests ensure consistent behavior across different OCR providers
Documentation Updates
๐ OCR Provider Comparison
- Updated provider documentation - Clear explanation of which providers support which processing modes
- Mode compatibility matrix - Easy reference for choosing the right provider and mode combination
- Mistral-specific guidance - Detailed setup instructions and best practices for Mistral OCR
- Configuration examples - Complete docker-compose examples for all supported configurations
Technical Details
Provider Mode Support Matrix
Provider | image | whole_pdf | |
---|---|---|---|
LLM (OpenAI/Ollama) | โ | โ | โ |
Azure Document Intelligence | โ | โ | โ |
Google Document AI | โ | โ | โ |
Mistral OCR (New!) | โ | โ | โ |
Docling | โ | โ | โ |
What's Changed
- feat: Add Mistral OCR provider with advanced PDF processing support - Extends
pdf
andwhole_pdf
mode support to a second provider - feat: Add OCR provider and processing mode validation - Prevents misconfigurations and provides helpful error messages
- feat: Pass original document title to title generation prompt - Improves context and relevance of AI-generated titles #453
- feat: Implement hybrid PDF naming strategy - Improved file naming with backward compatibility
- test: Add comprehensive Mistral OCR E2E tests - Full test coverage including diff comparison utilities
- docs: Update OCR processing modes documentation - Clear provider compatibility information
Configuration Example
environment:
# Mistral OCR (new advanced PDF support)
OCR_PROVIDER: "mistral_ocr"
MISTRAL_API_KEY: "your_mistral_api_key"
MISTRAL_MODEL: "mistral-ocr-latest" # Optional
OCR_PROCESS_MODE: "whole_pdf" # Now supported!
Migration Notes
- No breaking changes - Existing configurations continue to work as expected
- More provider choice - Users now have two options for advanced PDF processing (
pdf
andwhole_pdf
modes)
Performance Benefits
- Provider flexibility - Choose between Google Document AI and Mistral OCR based on your needs and pricing preferences
- Reduced API calls -
whole_pdf
mode processes entire documents in one request (now available with both advanced providers) - Better accuracy - Direct PDF processing maintains document structure and formatting
- Smarter title generation - Original title context leads to more relevant AI suggestions
PRs
- fix(deps): update module github.com/pdfcpu/pdfcpu to v0.11.0 by @renovate in #434
- fix: mislabeled data types in azure types by @moarsmokes in #455
- chore(deps): update react monorepo to v19.1.7 by @renovate in #429
- chore(deps): update dependency @vitejs/plugin-react-swc to v3.10.2 by @renovate in #424
- Enhance title suggestions with original title by @icereed in #466
- [mistral-ocr] Add MIME type detection, structured logging, and improvโฆ by @icereed in #468
Full Changelog: v0.20.0...v0.21.0