github kreuzberg-dev/kreuzberg v3.6.0
Release v3.6.0

latest releases: v4.0.0-rc.13, v4.0.0-rc.12, packages/go/v4/v4.0.0-rc.12...
5 months ago

Release v3.6.0

🚀 New Features

Entity Extraction & Named Entity Recognition

  • NEW: Added support for entity extraction using spaCy NLP library
  • NEW: Support for custom entity patterns using regex
  • NEW: Configurable entity extraction with SpacyEntityExtractionConfig
  • Enable with extract_entities=True in ExtractionConfig
  • Results available in ExtractionResult.entities

Keyword Extraction

  • NEW: Added keyword extraction capabilities using KeyBERT
  • NEW: Configurable number of keywords to extract
  • Enable with extract_keywords=True in ExtractionConfig
  • Results available in ExtractionResult.keywords with confidence scores

Language Detection Integration

  • FIXED: Language detection now properly integrated into extraction pipeline
  • Automatic language detection when auto_detect_language=True
  • Results available in ExtractionResult.detected_languages
  • Proper error handling for missing dependencies

🔧 Improvements

  • ENHANCEMENT: Replaced GLiNER with spaCy for better entity extraction performance
  • ENHANCEMENT: Improved post-processing pipeline architecture
  • ENHANCEMENT: Better separation of sync and async processing logic
  • ENHANCEMENT: Enhanced error handling for optional dependencies

🛠️ Technical Changes

  • Abstracted validation and post-processing logic into helper functions
  • Improved dependency management for optional features
  • Enhanced Docker workflow reliability
  • Updated AI assistant configuration
  • Improved exception handling for missing dependencies

📦 Dependencies

New Optional Dependencies

  • spacy: For entity extraction (install with kreuzberg[entities])
  • keybert: For keyword extraction (install with kreuzberg[keywords])
  • fast-langdetect: For language detection (install with kreuzberg[langdetect])

Installation Examples

# Basic installation
pip install kreuzberg

# With entity extraction
pip install "kreuzberg[entities]"

# With keyword extraction  
pip install "kreuzberg[keywords]"

# With language detection
pip install "kreuzberg[langdetect]"

# All features
pip install "kreuzberg[all]"

🐛 Bug Fixes

  • FIXED: Language detection now properly called during extraction
  • FIXED: MissingDependencyError properly raised for missing optional dependencies
  • FIXED: CI test failures related to language detection
  • FIXED: Docker workflow reliability issues

💥 Breaking Changes

None - all changes are backward compatible.

📚 Documentation

  • Added documentation for entity extraction configuration
  • Added documentation for keyword extraction features
  • Enhanced examples for new extraction capabilities

Full Changelog: v3.5.0...v3.6.0

Don't miss a new kreuzberg release

NewReleases is sending notifications on new releases.