New Features
- New batched inference that is 4x faster and accurate, Refer to README on usage instructions.
- Support for the new
large-v3-turbo
model. - VAD filter is now 3x faster on CPU.
- Feature Extraction is now 3x faster.
- Added
log_progress
toWhisperModel.transcribe
to print transcription progress. - Added
multilingual
option to transcription to allow transcribing multilingual audio. Note that Large models already have codeswitching capabilities, so this is mostly beneficial tomedium
model or smaller. WhisperModel.detect_language
now has the option to use VAD filter and improved language detection usinglanguage_detection_segments
andlanguage_detection_threshold
.
Bug Fixes
- Use correct features padding for encoder input when
chunk_length
<30s - Use correct
seek
value in output
Other Changes
- replace
NamedTuple
withdataclass
inWord
,Segment
,TranscriptionOptions
,TranscriptionInfo
, andVadOptions
, this allows conversion tojson
without nesting. Note that_asdict()
method is still available inWord
andSegment
classes for backward compatibility but will be removed in the next release, you can usedataclasses.asdict()
instead. - Added new tests for development
- Updated benchmarks in the Readme
- use
jiwer
instead ofevaluate
in benchmarks - Filter out non_speech_tokens in suppressed tokens by @jordimas in #898
New Contributors
- @Jiltseb made their first contribution in #856
- @heimoshuiyu made their first contribution in #1092
Full Changelog: v1.0.3...v1.1.0