github icereed/paperless-gpt v0.21.0

latest release: v0.22.0
2 months ago

Release Highlights ๐Ÿš€

New Features

๐Ÿ”ฎ Mistral OCR Integration with Advanced PDF Processing

  • Extended PDF processing support - Mistral OCR now joins Google Document AI in supporting all processing modes: image, pdf, and whole_pdf
  • Cost-effective OCR - Purpose-built OCR endpoint optimized for document processing with competitive pricing
  • Markdown-formatted output - Returns well-structured markdown text that preserves document formatting and layout
  • Large document support - Handles files up to 50MB and 1,000 pages efficiently
  • Set OCR_PROVIDER: "mistral_ocr" and configure your Mistral API key to get started

๐Ÿท๏ธ Enhanced Title Generation with Context

  • Original title context - Title generation now includes the existing document title as contextual information
  • Improved relevance - Language models can use the original title to generate more accurate and contextually appropriate suggestions
  • Better continuity - Maintains document naming consistency while enhancing title quality
  • Smart fallbacks - Handles cases where original titles are missing or incomplete

Improvements & Refinements

๐Ÿ›ก๏ธ Configuration Validation

  • OCR provider compatibility checks - Prevents invalid combinations of OCR providers and processing modes
  • Clear error messages - Detailed feedback when unsupported mode combinations are detected
  • Startup validation - Early detection of configuration issues before processing begins
  • Provider-specific guidance - Helpful error messages explain which modes are supported by each provider

๐Ÿ“„ Enhanced PDF Processing Architecture

  • Hybrid file naming - Improved PDF splitting with standardized naming conventions that maintain backward compatibility
  • More provider choice - Users can now choose between Google Document AI and Mistral OCR for advanced PDF processing
  • Consistent behavior - Both advanced providers support pdf and whole_pdf modes with similar performance characteristics

๐Ÿงช Comprehensive E2E Testing

  • Mistral OCR test suite - Full end-to-end testing of Mistral OCR integration with real PDF documents
  • Processing mode validation - Tests verify whole_pdf mode works correctly with multi-page documents
  • Performance metrics - Test output includes detailed comparison of original vs. enhanced OCR content
  • Cross-provider compatibility - Tests ensure consistent behavior across different OCR providers

Documentation Updates

๐Ÿ“š OCR Provider Comparison

  • Updated provider documentation - Clear explanation of which providers support which processing modes
  • Mode compatibility matrix - Easy reference for choosing the right provider and mode combination
  • Mistral-specific guidance - Detailed setup instructions and best practices for Mistral OCR
  • Configuration examples - Complete docker-compose examples for all supported configurations

Technical Details

Provider Mode Support Matrix

Provider image pdf whole_pdf
LLM (OpenAI/Ollama) โœ… โŒ โŒ
Azure Document Intelligence โœ… โŒ โŒ
Google Document AI โœ… โœ… โœ…
Mistral OCR (New!) โœ… โœ… โœ…
Docling โœ… โŒ โŒ

What's Changed

  • feat: Add Mistral OCR provider with advanced PDF processing support - Extends pdf and whole_pdf mode support to a second provider
  • feat: Add OCR provider and processing mode validation - Prevents misconfigurations and provides helpful error messages
  • feat: Pass original document title to title generation prompt - Improves context and relevance of AI-generated titles #453
  • feat: Implement hybrid PDF naming strategy - Improved file naming with backward compatibility
  • test: Add comprehensive Mistral OCR E2E tests - Full test coverage including diff comparison utilities
  • docs: Update OCR processing modes documentation - Clear provider compatibility information

Configuration Example

environment:
  # Mistral OCR (new advanced PDF support)
  OCR_PROVIDER: "mistral_ocr"
  MISTRAL_API_KEY: "your_mistral_api_key"
  MISTRAL_MODEL: "mistral-ocr-latest"  # Optional
  OCR_PROCESS_MODE: "whole_pdf"        # Now supported!

Migration Notes

  • No breaking changes - Existing configurations continue to work as expected
  • More provider choice - Users now have two options for advanced PDF processing (pdf and whole_pdf modes)

Performance Benefits

  • Provider flexibility - Choose between Google Document AI and Mistral OCR based on your needs and pricing preferences
  • Reduced API calls - whole_pdf mode processes entire documents in one request (now available with both advanced providers)
  • Better accuracy - Direct PDF processing maintains document structure and formatting
  • Smarter title generation - Original title context leads to more relevant AI suggestions

PRs

  • fix(deps): update module github.com/pdfcpu/pdfcpu to v0.11.0 by @renovate in #434
  • fix: mislabeled data types in azure types by @moarsmokes in #455
  • chore(deps): update react monorepo to v19.1.7 by @renovate in #429
  • chore(deps): update dependency @vitejs/plugin-react-swc to v3.10.2 by @renovate in #424
  • Enhance title suggestions with original title by @icereed in #466
  • [mistral-ocr] Add MIME type detection, structured logging, and improvโ€ฆ by @icereed in #468

Full Changelog: v0.20.0...v0.21.0

Don't miss a new paperless-gpt release

NewReleases is sending notifications on new releases.