icereed/paperless-gpt v0.21.0 on GitHub

Release Highlights 🚀

New Features

🔮 Mistral OCR Integration with Advanced PDF Processing

Extended PDF processing support - Mistral OCR now joins Google Document AI in supporting all processing modes: image, pdf, and whole_pdf
Cost-effective OCR - Purpose-built OCR endpoint optimized for document processing with competitive pricing
Markdown-formatted output - Returns well-structured markdown text that preserves document formatting and layout
Large document support - Handles files up to 50MB and 1,000 pages efficiently
Set OCR_PROVIDER: "mistral_ocr" and configure your Mistral API key to get started

🏷️ Enhanced Title Generation with Context

Original title context - Title generation now includes the existing document title as contextual information
Improved relevance - Language models can use the original title to generate more accurate and contextually appropriate suggestions
Better continuity - Maintains document naming consistency while enhancing title quality
Smart fallbacks - Handles cases where original titles are missing or incomplete

Improvements & Refinements

🛡️ Configuration Validation

OCR provider compatibility checks - Prevents invalid combinations of OCR providers and processing modes
Clear error messages - Detailed feedback when unsupported mode combinations are detected
Startup validation - Early detection of configuration issues before processing begins
Provider-specific guidance - Helpful error messages explain which modes are supported by each provider

📄 Enhanced PDF Processing Architecture

Hybrid file naming - Improved PDF splitting with standardized naming conventions that maintain backward compatibility
More provider choice - Users can now choose between Google Document AI and Mistral OCR for advanced PDF processing
Consistent behavior - Both advanced providers support pdf and whole_pdf modes with similar performance characteristics

🧪 Comprehensive E2E Testing

Mistral OCR test suite - Full end-to-end testing of Mistral OCR integration with real PDF documents
Processing mode validation - Tests verify whole_pdf mode works correctly with multi-page documents
Performance metrics - Test output includes detailed comparison of original vs. enhanced OCR content
Cross-provider compatibility - Tests ensure consistent behavior across different OCR providers

Documentation Updates

📚 OCR Provider Comparison

Updated provider documentation - Clear explanation of which providers support which processing modes
Mode compatibility matrix - Easy reference for choosing the right provider and mode combination
Mistral-specific guidance - Detailed setup instructions and best practices for Mistral OCR
Configuration examples - Complete docker-compose examples for all supported configurations

Technical Details

Provider Mode Support Matrix

Provider	image	pdf	whole_pdf
LLM (OpenAI/Ollama)	✅	❌	❌
Azure Document Intelligence	✅	❌	❌
Google Document AI	✅	✅	✅
Mistral OCR (New!)	✅	✅	✅
Docling	✅	❌	❌

What's Changed

feat: Add Mistral OCR provider with advanced PDF processing support - Extends pdf and whole_pdf mode support to a second provider
feat: Add OCR provider and processing mode validation - Prevents misconfigurations and provides helpful error messages
feat: Pass original document title to title generation prompt - Improves context and relevance of AI-generated titles #453
feat: Implement hybrid PDF naming strategy - Improved file naming with backward compatibility
test: Add comprehensive Mistral OCR E2E tests - Full test coverage including diff comparison utilities
docs: Update OCR processing modes documentation - Clear provider compatibility information

Configuration Example

environment:
  # Mistral OCR (new advanced PDF support)
  OCR_PROVIDER: "mistral_ocr"
  MISTRAL_API_KEY: "your_mistral_api_key"
  MISTRAL_MODEL: "mistral-ocr-latest"  # Optional
  OCR_PROCESS_MODE: "whole_pdf"        # Now supported!

Migration Notes

No breaking changes - Existing configurations continue to work as expected
More provider choice - Users now have two options for advanced PDF processing (pdf and whole_pdf modes)

Performance Benefits

Provider flexibility - Choose between Google Document AI and Mistral OCR based on your needs and pricing preferences
Reduced API calls - whole_pdf mode processes entire documents in one request (now available with both advanced providers)
Better accuracy - Direct PDF processing maintains document structure and formatting
Smarter title generation - Original title context leads to more relevant AI suggestions

PRs

fix(deps): update module github.com/pdfcpu/pdfcpu to v0.11.0 by @renovate in #434
fix: mislabeled data types in azure types by @moarsmokes in #455
chore(deps): update react monorepo to v19.1.7 by @renovate in #429
chore(deps): update dependency @vitejs/plugin-react-swc to v3.10.2 by @renovate in #424
Enhance title suggestions with original title by @icereed in #466
[mistral-ocr] Add MIME type detection, structured logging, and improv… by @icereed in #468

Full Changelog: v0.20.0...v0.21.0