Device and Language Support
- added Korean wav2vec2 model by @Boulaouaney in #277
- Add Czech alignment model by @Thebys in #280
- Adding Norwegian Bokmål and Norwegian Nynorsk by @peregilk in #636
- Support language names in
--language
parameter. by @jkukul in #517 - Add align model for catalan language. by @davidmartinrius in #581
- add missing Cantonese in supported languages by @MahmoudAshraf97 in #617
- Add alignment model for Malayalam by @kurianbenoy in #585
- Added Romanian phoneme-based ASR model by @Majdoddin in #791
- added alignment for sk and sl languages by @jan-panoch in #852
- Add war2vec model for Vietnamese in #278
- Add Urdu model support for alignment by @abCods in #374
- chore(writer): Join words without spaces for ja, zh by @jim60105 in #440
Bug Fixes and Stability Improvements
- fix Unequal Stack Size VAD error by @m-bain in #281
- fix: Bug in type hinting by @VisionOra in #294
- pin faster whisper by @sorgfresser in #474
- Fix repeat transcription on different languages and proper suppress_numerals use by @Joemgu7 in #395
- fix writer fail on segments 0 by @sorgfresser in #429
- fix missing speaker prefix by @invisprints in #438
- fix: correct defaut_asr_options with new options (patch 0.8) by @remic33 in #458
- Fixes --model_dir path by @canoalberto in #648
- fix: force ctranslate to version 4.4.0 by @Barabazs in #946
- fix: update faster-whisper dependencies by @cococig in #716
- fix: ZeroDivisionError when --print_progress True by @mvoggu in #494
- Minor fixes for word options and subtitles by @amolinasalazar in #549
- fix unboundlocalerror by @sorgfresser in #554
- Fix: Allow vad options to be configurable by passing to FasterWhisperPipeline and merge_chunks. by @abettke in #507
- fix minimum input length for torch wav2vec2 models by @MahmoudAshraf97 in #510
- fix(diarize): key error on empty track by @characat0 in #518
- pip compliance for git+ installs by @spbisc97 in #603
Documentation Updates
- adds link to whisperX medium on replicate.com by @CaRniFeXeR in #431
- Document --compute_type command line option by @dotgrid in #430
- adding link to Replicate demo by @daanelson in #352
- fix: typo in error message by @zamoshchin in #493
- Fix link in README.md by @jimregan in #668
- Update README.md by @valentt in #509
- Add a special note about Speaker-Diarization-3.0 in readme by @kaihe-stori in #521
- Update README to correct speaker diarization version link by @gillens in #618
- Update README.md by @mlopsengr in #630
- fix link by @M0HID in #605
- Remove torchvision from README by @baer in #378
Miscellaneous Changes
- move model to assets by @m-bain in #945
- Update alignment.py by @Ayushi-Desynova in #418
- Update alignment.py by @awerks in #427
- push contributions from main by @m-bain in #290
- make diarization faster by @davidas1 in #400
- Add device_index option by @sorgfresser in #266
- Add transcribe keywords by @sorgfresser in #269
- Added download path parameter. by @prameshbajra in #284
- Suppress numerals by @m-bain in #303
- Add Audacity export by @Ca-ressemble-a-du-fake in #309
- Update transcribe.py -> small change in
batch_size
description by @mabergerx in #382 - Suggest using pytorch-cuda 11.8 instead of 11.7 by @tijszwinkels in #255
- feat: Add merge chunks chunk_size as arguments. by @jim60105 in #445
- A solution to long subtitles and words without timestamps by @awerks in #459
- chore(writer): improve text display(ja etc) in json file by @darwintree in #472
- add faster whisper threading by @sorgfresser in #473
- Pyannote3 by @remic33 in #492
- Update alignment.py by @piuy11 in #487
- Pass patience and beam_size to faster-whisper. by @jkukul in #527
- remove the minimum length for alignment and print the failing segment by @MahmoudAshraf97 in #529
- Update setup.py to use pyannote.audio version with working GPU by @wuurrd in #531
- Update setup.py to download pyannote depending on platform by @justinwlin in #541
- Drop ffmpeg-python dependency and call ffmpeg directly. by @hidenori-endo in #570
- no align based on space by @sorgfresser in #556
- Update asr.py and make the model parameter be used by @kaka1909 in #580
- Move load_model after WhisperModel by @DougTrajano in #584
- Update pyannote to 3.1.0 by @remic33 in #586
- support for
large-v3
by @MahmoudAshraf97 in #599 - Added option to load Custom VAD model to load model method by @Swami-Abhinav in #654
- Update pyannote to v3.1.1 to fix a diarization problem (and diarize.py) by @santialferez in #646
- Get rid of numeral_symbol_tokens variable in printed message by @KossaiSbai in #669
- Add Replicate large-v3 demo by @victor-upmeet in #703
- local vad model by @m-bain in #944
- Feat: add new align models - SHORT by @Equipo45 in #922
- Update alignment.py by @peregilk in #687
Full Changelog: v3.1.1...v3.2.0