Overview
This release should be considered in Beta stage, since I haven't done a lot of testing and I am not sure if I didn't break something.
But overall, I believe both the performance and the quality are improved.
- Added Core ML support #566
- Restored decoding fallbacks with default size of 2 instead of 5 (f19e23f)
- Pad the audio with zeros instead of the spectrogram (5108b30)
- Added talk-llama example
- Added
whisper_state
which allows parallel transcriptions with a single model in memory (#523)
The C-style API has been extended significantly to support the new whisper_state
, but in general should be backwards compatible.
The only breaking change is in the callbacks signatures.
Please provide feedback in the discussion if you observe any issues.
The next release v1.4.0
will follow up relatively soon and will provide 4-bit integer quantization support.
What's Changed
- update csv output format to match OpenAI's Whisper dataframe output by @hykelvinlee42 in #552
- Go binding: NewContext now returns a clean context by @polarmoon in #537
- Added whisper state + default state on the whisper_context by @sandrohanea in #523
- whisper.android: Enable fp16 instrinsics (FP16_VA) which is supported by ARMv8.2 or later. by @tinoue in #572
- Add quality comparison helper by @venkr in #569
- whisper.android: Support benchmark for Android example. by @tinoue in #542
- Fix MUSL Linux build by @ggerganov in #576
- Change default encoding to UTF-8 by @Kamilake in #605
- Provide option for creating JSON output by @tuxpoldo in #615
- readme : add react-native bindings by @jhen0409 in #619
- Fixed language auto-detection for state provided processing. by @sandrohanea in #627
- xcodeproj : add
-O3 -DNDEBUG
in release mode by @jhen0409 in #640 - Nodejs Addon blocking main thread. Implemented Napi::AsyncWorker by @LucasZNK in #642
- Include link to R wrapper in README by @jwijffels in #626
- Add a cmake flag to disable F16C by @a5huynh in #628
- Add talk-llama example by @ggerganov in #664
- Add Alpaca support to talk-llama example by @ejones in #668
- Update README.md by @razodactyl in #682
- issue #470 - working 32-bit ARM by @clach04 in #486
- whisper : add initial_prompt param by @jhen0409 in #645
- fix typo in JSON output by @egorFiNE in #648
- Fix shell script ./models/download-ggml-model.sh to handle spaces and special characters in paths by @be-next in #677
- Fixed test to new async implementation by @LucasZNK in #686
- Minor: fixing usage message for talk-llama by @InconsolableCellist in #687
- Small typo by @ZiggerZZ in #688
- feat: add progress callback by @pajowu in #600
- ggml : fix q4_1 dot product types by @novag in #759
- Exposed various parts to the Go Interface by @bmurray in #697
- Adds shell command example for --print-colors by @bocytko in #710
- Makefile: disable avx in case f16c is not available by @duthils in #706
- Making the quick start instructions clearer. by @Onlyartist9 in #716
- Add lrc output support by @WhichWho in #718
- Corrects default speak.sh path in talk-llama by @mab122 in #720
- Add msvc compiler args /utf-8 fix error C3688 by @WhichWho in #721
- Changed convert-pt-to-ggml.py to use .tiktoken tokenizer files by @ivan-gorin in #725
- talk/talk-llama: add basic example script for eleven-labs tts by @DGdev91 in #728
- readme : add Unity3d bindings by @Macoron in #733
- Update stream.cpp by @AliAlameh in #501
- Fix typos in whisper.h by @GitAritron in #737
- Update LICENSE by @masguit42 in #739
- fix potential memory leaks by @baderouaich in #740
- readme: Add alternate swift bindings by @exPHAT in #755
- Fix the bug related to word splitting errors in the "tokenize" function. by @AfryMask in #760
- Do not launch threads for
log_mel_spectrogram
when singlethreaded by @maxilevi in #763 - Core ML support by @ggerganov in #566
- ggml : fix build on whisper.android (ARM_NEON) by @jhen0409 in #764
New Contributors
- @hykelvinlee42 made their first contribution in #552
- @tinoue made their first contribution in #572
- @venkr made their first contribution in #569
- @Kamilake made their first contribution in #605
- @tuxpoldo made their first contribution in #615
- @jhen0409 made their first contribution in #619
- @LucasZNK made their first contribution in #642
- @jwijffels made their first contribution in #626
- @a5huynh made their first contribution in #628
- @ejones made their first contribution in #668
- @razodactyl made their first contribution in #682
- @clach04 made their first contribution in #486
- @egorFiNE made their first contribution in #648
- @be-next made their first contribution in #677
- @InconsolableCellist made their first contribution in #687
- @ZiggerZZ made their first contribution in #688
- @pajowu made their first contribution in #600
- @novag made their first contribution in #759
- @bmurray made their first contribution in #697
- @bocytko made their first contribution in #710
- @duthils made their first contribution in #706
- @Onlyartist9 made their first contribution in #716
- @WhichWho made their first contribution in #718
- @mab122 made their first contribution in #720
- @ivan-gorin made their first contribution in #725
- @DGdev91 made their first contribution in #728
- @Macoron made their first contribution in #733
- @AliAlameh made their first contribution in #501
- @GitAritron made their first contribution in #737
- @masguit42 made their first contribution in #739
- @baderouaich made their first contribution in #740
- @exPHAT made their first contribution in #755
- @AfryMask made their first contribution in #760
- @maxilevi made their first contribution in #763
Full Changelog: v1.2.1...v1.3.0