Overview
- Massive performance improvements for the Metal backend, especially for beams > 1 and for quantized models
- Reduce hallucinations during silence by @jkarthic in #2629
- Implement no_speech_thold by @jkarthic in #2625
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | Metal | tiny | 1 | 1 | 7.90 | 1.26 | 0.35 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q5_0 | 1 | 1 | 8.44 | 1.23 | 0.36 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q5_1 | 1 | 1 | 8.26 | 1.27 | 0.37 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q8_0 | 1 | 1 | 8.03 | 1.21 | 0.35 | 0.01 | ed733e8 |
M2 Ultra | Metal | base | 1 | 1 | 13.77 | 1.80 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q5_0 | 1 | 1 | 15.02 | 1.72 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q5_1 | 1 | 1 | 14.93 | 1.74 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q8_0 | 1 | 1 | 14.26 | 1.68 | 0.41 | 0.02 | ed733e8 |
M2 Ultra | Metal | small | 1 | 1 | 39.76 | 3.54 | 0.85 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q5_0 | 1 | 1 | 45.07 | 3.47 | 0.87 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q5_1 | 1 | 1 | 44.82 | 3.49 | 0.87 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q8_0 | 1 | 1 | 41.79 | 3.30 | 0.84 | 0.05 | ed733e8 |
M2 Ultra | Metal | medium | 1 | 1 | 106.73 | 7.28 | 1.78 | 0.11 | ed733e8 |
M2 Ultra | Metal | medium-q5_0 | 1 | 1 | 124.43 | 6.63 | 1.83 | 0.12 | ed733e8 |
M2 Ultra | Metal | medium-q5_1 | 1 | 1 | 124.19 | 6.70 | 1.84 | 0.12 | ed733e8 |
M2 Ultra | Metal | medium-q8_0 | 1 | 1 | 113.88 | 6.52 | 1.75 | 0.11 | ed733e8 |
M2 Ultra | Metal | medium-dis | 1 | 1 | 94.97 | 0.97 | 0.22 | 0.01 | ed733e8 |
M2 Ultra | Metal | large-v2 | 1 | 1 | 193.33 | 10.53 | 2.65 | 0.20 | ed733e8 |
M2 Ultra | Metal | large-v2-q5_0 | 1 | 1 | 229.22 | 9.52 | 2.72 | 0.23 | ed733e8 |
M2 Ultra | Metal | large-v2-q5_1 | 1 | 1 | 229.40 | 9.62 | 2.73 | 0.23 | ed733e8 |
M2 Ultra | Metal | large-v2-q8_0 | 1 | 1 | 207.30 | 9.36 | 2.59 | 0.21 | ed733e8 |
M2 Ultra | Metal | large-v2-dis | 1 | 1 | 171.43 | 1.09 | 0.25 | 0.02 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo | 1 | 1 | 173.45 | 1.73 | 0.41 | 0.03 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo-q5_0 | 1 | 1 | 205.52 | 1.52 | 0.42 | 0.04 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo-q8_0 | 1 | 1 | 185.90 | 1.48 | 0.40 | 0.03 | ed733e8 |
What's Changed
- sync : ggml by @ggerganov in #2573
- ruby : Follow source tree change by @KitaitiMakoto in #2580
- Add
q8_0
models todownload-ggml-model.sh
by @mrienstra in #2589 - ruby : Add low-level methods to transcribe by @KitaitiMakoto in #2585
- sync : ggml by @ggerganov in #2608
- ruby : Sync whisper.cpp and model download feature by @KitaitiMakoto in #2617
- Fix typo in
download-ggml-model.sh
by @mrienstra in #2623 - Add Missing Include Directory for ggml-cpu in whisper.android CMakeLists by @Thamster in #2624
- fix: prevent division by zero in soft_max vulkan shader by @gn64 in #2633
- cmake : fix "amd64" processor string by @ggerganov in #2638
- Fix typo in Java Binding README by @crummyh in #2637
- Fix hallucinations during silence by @jkarthic in #2629
- Implement no_speech_thold by @jkarthic in #2625
- Improve consistency in stream exameple README commands by @crummyh in #2642
- ruby : Add no_speech_thold by @KitaitiMakoto in #2641
- sync : ggml by @ggerganov in #2639
- ci : msys enable SDL2 build by @ggerganov in #2635
New Contributors
- @Thamster made their first contribution in #2624
- @gn64 made their first contribution in #2633
- @crummyh made their first contribution in #2637
- @jkarthic made their first contribution in #2629
Full Changelog: v1.7.2...v1.7.3