🚀 paperless-gpt v0.25.0 - "The Vision Expansion"

Gemini joins the vision OCR party, polling gets smarter, and image processing becomes fully tunable!

🌟 Major New Features

👁️ Google Gemini as Vision (OCR) Provider

Google Gemini can now be used directly as a vision LLM provider for OCR! With native PDF handling, Gemini supports image, pdf, and whole_pdf processing modes out of the box—giving you even more flexibility for document processing.

Set VISION_LLM_PROVIDER: "googleai" and VISION_LLM_MODEL: "gemini-2.5-flash" to get started
Optional thinking budget support for reasoning models via GOOGLEAI_THINKING_BUDGET
Thanks to @its-a-unixsystem for this fantastic contribution!

🎛️ Configurable Image Processing Limits

Image processing limits are no longer hardcoded! You can now fine-tune max pixel dimensions, total pixels, render DPI, and max file size via environment variables. This is especially useful for smaller vision models like minicpm-v that perform better with appropriately sized images.

IMAGE_MAX_PIXEL_DIMENSION, IMAGE_MAX_TOTAL_PIXELS, IMAGE_MAX_RENDER_DPI, IMAGE_MAX_FILE_BYTES
Sensible defaults maintained for backward compatibility
Great addition by @cfilipov!

⚡ Performance Improvements

🔄 Smarter Paperless Polling

Polling for new documents is now significantly faster and lighter on resources. Instead of querying the full documents API, paperless-gpt now uses tag-based document counting to short-circuit when there's nothing to process—reducing response times from 500–1000ms down to 10–50ms on larger instances.

Fixes high CPU usage on paperless-ngx caused by aggressive polling
Single-tag retrieval replaces multi-tag queries
API calls are aborted early when no documents need processing
Thanks to @apoapoapo for this performance win!

🔧 CI/CD Improvements

🛡️ Secure Fork PR E2E Testing

Fork pull requests can now run E2E tests securely via a label-based approval workflow. Maintainers add the safe-to-test label after code review, which triggers E2E tests with secret access using local Docker builds—no registry credentials needed.
Contributed by @Copilot!

⚙️ Configuration Highlights

New Environment Variables

# Gemini Vision OCR
VISION_LLM_PROVIDER: "googleai"
VISION_LLM_MODEL: "gemini-2.5-flash"
GOOGLEAI_API_KEY: "your-api-key"
GOOGLEAI_THINKING_BUDGET: "16384"  # Optional
OCR_PROCESS_MODE: "whole_pdf"      # image, pdf, or whole_pdf

# Image Processing Limits
IMAGE_MAX_PIXEL_DIMENSION: "10000"
IMAGE_MAX_TOTAL_PIXELS: "40000000"
IMAGE_MAX_RENDER_DPI: "300"
IMAGE_MAX_FILE_BYTES: "10485760"

No Breaking Changes

Existing configurations continue to work as expected. All new features are opt-in!

📋 What's Changed

Add secure fork PR E2E testing with label-based approval by @Copilot in #857
Improve paperless polling for new documents by @apoapoapo in #889
feat: make image processing limits configurable by @cfilipov in #883
add Gemini as Vision (OCR) provider by @its-a-unixsystem in #881

🎉 New Contributors

A huge welcome to our new contributors! 🙌

@apoapoapo made their first contribution in #889
@cfilipov made their first contribution in #883
@its-a-unixsystem made their first contribution in #881

Full Changelog: v0.24.0...v0.25.0

Ready to upgrade?

docker pull icereed/paperless-gpt:latest

📚 Check out the README for full documentation on all features!

icereed/paperless-gpt v0.25.0 v0.25.0 - "The Vision Expansion" on GitHub