-
Voice Activity Control (VAC) by Default: VAC is now enabled by default to improve transcription accuracy by filtering out non-speech segments before processing transcription & diarization. You can disable it with the
--no-vacflag. -
Simulstreaming Backend Enhancements:
- The
simulstreamingbackend is now the default transcription backend. - Improved timestamp accuracy for audio segments longer than 30 seconds.
- Backends models are now recycled to optimize resource usage, by removing whisper hooks at the end of a transcription
- Added the ability to preload multiple backend models using the
--preloaded_model_countargument, when several users are espected
- The
-
Diarization with Silences: The
diartdiarization backend now correctly handles pauses and silences, improving speaker turn detection. -
Time Handling: Aligned time handling between the backend and the frontend for better synchronization.
-
WebSocket Communication: Buffering is disabled during silent periods.
-
Default Model: The default model is now
base.