🚀 paperless-gpt v0.25.0 - "The Vision Expansion"
Gemini joins the vision OCR party, polling gets smarter, and image processing becomes fully tunable!
🌟 Major New Features
👁️ Google Gemini as Vision (OCR) Provider
Google Gemini can now be used directly as a vision LLM provider for OCR! With native PDF handling, Gemini supports image, pdf, and whole_pdf processing modes out of the box—giving you even more flexibility for document processing.
- Set
VISION_LLM_PROVIDER: "googleai"andVISION_LLM_MODEL: "gemini-2.5-flash"to get started - Optional thinking budget support for reasoning models via
GOOGLEAI_THINKING_BUDGET
Thanks to @its-a-unixsystem for this fantastic contribution!
🎛️ Configurable Image Processing Limits
Image processing limits are no longer hardcoded! You can now fine-tune max pixel dimensions, total pixels, render DPI, and max file size via environment variables. This is especially useful for smaller vision models like minicpm-v that perform better with appropriately sized images.
IMAGE_MAX_PIXEL_DIMENSION,IMAGE_MAX_TOTAL_PIXELS,IMAGE_MAX_RENDER_DPI,IMAGE_MAX_FILE_BYTES- Sensible defaults maintained for backward compatibility
Great addition by @cfilipov!
⚡ Performance Improvements
🔄 Smarter Paperless Polling
Polling for new documents is now significantly faster and lighter on resources. Instead of querying the full documents API, paperless-gpt now uses tag-based document counting to short-circuit when there's nothing to process—reducing response times from 500–1000ms down to 10–50ms on larger instances.
- Fixes high CPU usage on paperless-ngx caused by aggressive polling
- Single-tag retrieval replaces multi-tag queries
- API calls are aborted early when no documents need processing
Thanks to @apoapoapo for this performance win!
🔧 CI/CD Improvements
🛡️ Secure Fork PR E2E Testing
Fork pull requests can now run E2E tests securely via a label-based approval workflow. Maintainers add the safe-to-test label after code review, which triggers E2E tests with secret access using local Docker builds—no registry credentials needed.
Contributed by @Copilot!
⚙️ Configuration Highlights
New Environment Variables
# Gemini Vision OCR
VISION_LLM_PROVIDER: "googleai"
VISION_LLM_MODEL: "gemini-2.5-flash"
GOOGLEAI_API_KEY: "your-api-key"
GOOGLEAI_THINKING_BUDGET: "16384" # Optional
OCR_PROCESS_MODE: "whole_pdf" # image, pdf, or whole_pdf
# Image Processing Limits
IMAGE_MAX_PIXEL_DIMENSION: "10000"
IMAGE_MAX_TOTAL_PIXELS: "40000000"
IMAGE_MAX_RENDER_DPI: "300"
IMAGE_MAX_FILE_BYTES: "10485760"No Breaking Changes
Existing configurations continue to work as expected. All new features are opt-in!
📋 What's Changed
- Add secure fork PR E2E testing with label-based approval by @Copilot in #857
- Improve paperless polling for new documents by @apoapoapo in #889
- feat: make image processing limits configurable by @cfilipov in #883
- add Gemini as Vision (OCR) provider by @its-a-unixsystem in #881
🎉 New Contributors
A huge welcome to our new contributors! 🙌
- @apoapoapo made their first contribution in #889
- @cfilipov made their first contribution in #883
- @its-a-unixsystem made their first contribution in #881
Full Changelog: v0.24.0...v0.25.0
Ready to upgrade?
docker pull icereed/paperless-gpt:latest📚 Check out the README for full documentation on all features!