github kreuzberg-dev/kreuzberg v3.0.0

latest releases: v4.0.0-rc.13, v4.0.0-rc.12, packages/go/v4/v4.0.0-rc.12...
9 months ago

Enhancements:

  • added support for multiple OCR backends: added PaddleOCR and Easy OCR (feature)
  • added support for having no OCR backend (feature)
  • changed Tesseract OCR to optional (enhancement)
  • added support for registering creating custom extractors (feature)
  • added support for overriding builtin extractors (feature)
  • added support for post-processing hooks (feature)
  • added support for validation hooks (feature)
  • added PDF metadata extraction using Playa-PDF (feature)
  • added optional chunking support (feature)
  • added documentation site (documentation)

Breaking Changes:

  • Changed ExtractionResults from NamedTuple to TypedDict (breaking change; api)

Internal:

  • Rework internals to allow extensibility by changing to a class-based architecture (internal; architecture)

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.