Release v3.6.0
🚀 New Features
Entity Extraction & Named Entity Recognition
- NEW: Added support for entity extraction using spaCy NLP library
- NEW: Support for custom entity patterns using regex
- NEW: Configurable entity extraction with
SpacyEntityExtractionConfig - Enable with
extract_entities=TrueinExtractionConfig - Results available in
ExtractionResult.entities
Keyword Extraction
- NEW: Added keyword extraction capabilities using KeyBERT
- NEW: Configurable number of keywords to extract
- Enable with
extract_keywords=TrueinExtractionConfig - Results available in
ExtractionResult.keywordswith confidence scores
Language Detection Integration
- FIXED: Language detection now properly integrated into extraction pipeline
- Automatic language detection when
auto_detect_language=True - Results available in
ExtractionResult.detected_languages - Proper error handling for missing dependencies
🔧 Improvements
- ENHANCEMENT: Replaced GLiNER with spaCy for better entity extraction performance
- ENHANCEMENT: Improved post-processing pipeline architecture
- ENHANCEMENT: Better separation of sync and async processing logic
- ENHANCEMENT: Enhanced error handling for optional dependencies
🛠️ Technical Changes
- Abstracted validation and post-processing logic into helper functions
- Improved dependency management for optional features
- Enhanced Docker workflow reliability
- Updated AI assistant configuration
- Improved exception handling for missing dependencies
📦 Dependencies
New Optional Dependencies
spacy: For entity extraction (install withkreuzberg[entities])keybert: For keyword extraction (install withkreuzberg[keywords])fast-langdetect: For language detection (install withkreuzberg[langdetect])
Installation Examples
# Basic installation
pip install kreuzberg
# With entity extraction
pip install "kreuzberg[entities]"
# With keyword extraction
pip install "kreuzberg[keywords]"
# With language detection
pip install "kreuzberg[langdetect]"
# All features
pip install "kreuzberg[all]"🐛 Bug Fixes
- FIXED: Language detection now properly called during extraction
- FIXED:
MissingDependencyErrorproperly raised for missing optional dependencies - FIXED: CI test failures related to language detection
- FIXED: Docker workflow reliability issues
💥 Breaking Changes
None - all changes are backward compatible.
📚 Documentation
- Added documentation for entity extraction configuration
- Added documentation for keyword extraction features
- Enhanced examples for new extraction capabilities
Full Changelog: v3.5.0...v3.6.0