New Features
- New batched inference that is 4x faster and accurate, Refer to README on usage instructions.
- Support for the new
large-v3-turbomodel. - VAD filter is now 3x faster on CPU.
- Feature Extraction is now 3x faster.
- Added
log_progresstoWhisperModel.transcribeto print transcription progress. - Added
multilingualoption to transcription to allow transcribing multilingual audio. Note that Large models already have codeswitching capabilities, so this is mostly beneficial tomediummodel or smaller. WhisperModel.detect_languagenow has the option to use VAD filter and improved language detection usinglanguage_detection_segmentsandlanguage_detection_threshold.
Bug Fixes
- Use correct features padding for encoder input when
chunk_length<30s - Use correct
seekvalue in output
Other Changes
- replace
NamedTuplewithdataclassinWord,Segment,TranscriptionOptions,TranscriptionInfo, andVadOptions, this allows conversion tojsonwithout nesting. Note that_asdict()method is still available inWordandSegmentclasses for backward compatibility but will be removed in the next release, you can usedataclasses.asdict()instead. - Added new tests for development
- Updated benchmarks in the Readme
- use
jiwerinstead ofevaluatein benchmarks - Filter out non_speech_tokens in suppressed tokens by @jordimas in #898
New Contributors
- @Jiltseb made their first contribution in #856
- @heimoshuiyu made their first contribution in #1092
Full Changelog: v1.0.3...v1.1.0