github kreuzberg-dev/kreuzberg v3.15.0

latest releases: v4.0.0-rc.16, packages/go/v4/v4.0.0-rc.16, v4.0.0-rc.15...
3 months ago

🚀 Features

Image Extraction Support

  • Extract images from PDF, HTML, and presentation formats
  • OCR processing for extracted images with configurable backends (Tesseract, EasyOCR, PaddleOCR)
  • Image deduplication and dimension-based filtering
  • New ExtractedImage and ImageOCRResult types for structured image handling

Configuration Enhancements

  • New ImageOCRConfig class for unified image processing configuration
  • Enhanced API endpoints with image extraction parameters
  • Improved configuration caching and runtime handling

Testing & Documentation

  • Comprehensive test coverage expansion across all modules
  • Detailed image extraction documentation and examples
  • Performance optimization guidelines

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.