What's New
๐ OCR Receipt Scanning (no API key needed)
- New OCR provider using Tesseract.js โ works without any API keys or external services
- Automatic fallback: if the configured AI provider is unavailable, receipt scanning falls back to OCR
- Image preprocessing: grayscale, contrast normalization, sharpening, and auto-upscale via sharp for better accuracy on faded/noisy receipts
- Smart text parser: handles 8+ receipt formats (grocery, restaurant, cafe, gas station, pharmacy, bar, European, delivery), OCR artifacts (Ixโ1x, Oโ0, colon/dash decimals), discount/coupon exclusion, fee filtering, and international keywords (TVA, MWST, sous-total)
- Set
AI_PROVIDER=ocrto use explicitly, or leave your AI provider configured โ OCR kicks in automatically when it's unavailable
๐ Security Fixes (7 issues closed)
- Receipt ownership: new
uploadedByIdfield prevents cross-user access to ungrouped receipts - Guest isolation:
isGuestflag separates guest and authenticated receipt flows - Upload validation: magic byte checking (JPEG/PNG/WebP/HEIC) prevents malicious file uploads
- Prompt injection:
correctionHintsanitized across all AI providers - OCR stderr: no longer leaked in client-facing error messages
- Rate limiting: uses
cf-connecting-ipchain to prevent IP spoofing - Share tokens: truncated in log output
๐ Bug Fixes (13 issues closed)
- Duplicate receipt items: reprocessing no longer appends duplicates
- Correction rescan safety: failed rescans preserve existing items instead of erasing them
- Tax/tip distribution: uses receipt subtotal as denominator for correct proportional shares
- Negative price rejection: schema now enforces non-negative monetary values
- Receipt-to-expense validation: verifies payer, assignees are group members, item IDs belong to receipt
- Archived group protection: can't create receipt-based expenses on archived groups
- Guest split validation: index bounds checked, blank names filtered with index remapping
- retryProcessing: now actually reprocesses instead of just resetting status
- OpenAI provider: validates API key before construction, model configurable via
OPENAI_MODEL - Meridian provider: port configurable via
MERIDIAN_PORT,isAvailable()checks health endpoint - OCR normalization: Oโ0 scoped to price context only (no longer corrupts item names)
- Receipt dates: normalized to ISO
YYYY-MM-DDformat - Profile name update: JWT callback always reads fresh name from DB
โก Performance
- E2E test parallelization: ~9 min โ ~5 min with serial/parallel project split
- AI provider caching:
isAvailable()result cached for 60s - Receipt assignments: batched with
deleteMany+createManyin a single transaction (replaces NรM sequential upserts) - Middleware: correct
__Secure-cookie prefix for HTTPS
๐งช Testing (130 unit tests, 270+ e2e tests)
- 50 OCR parser unit tests across 8 receipt formats + edge cases
- 35 receipt images: generated, distorted, SROIE scans, Wikimedia photos, Gemini AI photos
- 8 OCR regression tests with pinned Golden Fork Tesseract output
- 8 guest API security/validation tests
- 6 receipt access control e2e tests
Full Changelog: v0.3.1...v0.4.0