License
- 0BSD license — the project license has been changed from MIT to 0BSD, a maximally permissive license with no attribution requirement. All prior 7.x releases should also be considered 0BSD licensed as of this release.
Features
- Added
mime_typefield to detection results — identifies file types for both binary (via magic number matching) and text content. Returned in alldetect(),detect_all(), andUniversalDetectorresults. (#350) - New
pipeline/magic.pymodule detects 40+ binary file formats including images, audio/video, archives, documents, executables, and fonts. ZIP-based formats (XLSX, DOCX, JAR, APK, EPUB, wheel, OpenDocument) are distinguished by entry filenames. (#350)
Bug Fixes
- Fixed incorrect equivalence between UTF-16-LE and UTF-16-BE in accuracy testing — these are distinct encodings with different byte order, not interchangeable
Performance
- Added 4 new modules to mypyc compilation (orchestrator, confusion, magic, ascii), bringing the total to 11 compiled modules
- Capped statistical scoring at 16 KB — bigram models converge quickly, so large files no longer score the full 200 KB. Worst-case detection time dropped from 62ms to 26ms with no accuracy loss.
- Replaced
dataclasses.replace()with directDetectionResultconstruction on hot paths, eliminating ~354k function calls per full test suite run