github icereed/paperless-gpt v0.25.0
v0.25.0 - "The Vision Expansion"

8 hours ago

🚀 paperless-gpt v0.25.0 - "The Vision Expansion"

Gemini joins the vision OCR party, polling gets smarter, and image processing becomes fully tunable!

🌟 Major New Features

👁️ Google Gemini as Vision (OCR) Provider

Google Gemini can now be used directly as a vision LLM provider for OCR! With native PDF handling, Gemini supports image, pdf, and whole_pdf processing modes out of the box—giving you even more flexibility for document processing.

  • Set VISION_LLM_PROVIDER: "googleai" and VISION_LLM_MODEL: "gemini-2.5-flash" to get started
  • Optional thinking budget support for reasoning models via GOOGLEAI_THINKING_BUDGET
    Thanks to @its-a-unixsystem for this fantastic contribution!

🎛️ Configurable Image Processing Limits

Image processing limits are no longer hardcoded! You can now fine-tune max pixel dimensions, total pixels, render DPI, and max file size via environment variables. This is especially useful for smaller vision models like minicpm-v that perform better with appropriately sized images.

  • IMAGE_MAX_PIXEL_DIMENSION, IMAGE_MAX_TOTAL_PIXELS, IMAGE_MAX_RENDER_DPI, IMAGE_MAX_FILE_BYTES
  • Sensible defaults maintained for backward compatibility
    Great addition by @cfilipov!

⚡ Performance Improvements

🔄 Smarter Paperless Polling

Polling for new documents is now significantly faster and lighter on resources. Instead of querying the full documents API, paperless-gpt now uses tag-based document counting to short-circuit when there's nothing to process—reducing response times from 500–1000ms down to 10–50ms on larger instances.

  • Fixes high CPU usage on paperless-ngx caused by aggressive polling
  • Single-tag retrieval replaces multi-tag queries
  • API calls are aborted early when no documents need processing
    Thanks to @apoapoapo for this performance win!

🔧 CI/CD Improvements

🛡️ Secure Fork PR E2E Testing

Fork pull requests can now run E2E tests securely via a label-based approval workflow. Maintainers add the safe-to-test label after code review, which triggers E2E tests with secret access using local Docker builds—no registry credentials needed.
Contributed by @Copilot!


⚙️ Configuration Highlights

New Environment Variables

# Gemini Vision OCR
VISION_LLM_PROVIDER: "googleai"
VISION_LLM_MODEL: "gemini-2.5-flash"
GOOGLEAI_API_KEY: "your-api-key"
GOOGLEAI_THINKING_BUDGET: "16384"  # Optional
OCR_PROCESS_MODE: "whole_pdf"      # image, pdf, or whole_pdf

# Image Processing Limits
IMAGE_MAX_PIXEL_DIMENSION: "10000"
IMAGE_MAX_TOTAL_PIXELS: "40000000"
IMAGE_MAX_RENDER_DPI: "300"
IMAGE_MAX_FILE_BYTES: "10485760"

No Breaking Changes

Existing configurations continue to work as expected. All new features are opt-in!


📋 What's Changed

  • Add secure fork PR E2E testing with label-based approval by @Copilot in #857
  • Improve paperless polling for new documents by @apoapoapo in #889
  • feat: make image processing limits configurable by @cfilipov in #883
  • add Gemini as Vision (OCR) provider by @its-a-unixsystem in #881

🎉 New Contributors

A huge welcome to our new contributors! 🙌


Full Changelog: v0.24.0...v0.25.0


Ready to upgrade?

docker pull icereed/paperless-gpt:latest

📚 Check out the README for full documentation on all features!

Don't miss a new paperless-gpt release

NewReleases is sending notifications on new releases.