Overview
This is a relatively big update with various build and CI improvements especially for iOS and WASM. There are also some performance gains, especially for the Metal backend and probably for Arm-based devices.
Big shoutout to @danbev for stepping up and completing the maintenance roadmap for this release!
Mobile examples
All mobile examples have been refreshed. The iOS examples specifically are now much easier to build thanks to the new XCFramework workflow. This should simplify significantly integration of whisper.cpp
in 3rd party iOS and macOS apps. CoreML build and convert instructions have also been updated.
WASM examples
The WASM examples are now automatically updated on each new commit and hosted in Github Pages at https://ggerganov.github.io/whisper.cpp/. Problems with CORS rules should be resolved.
Some performance numbers for this release:
M2 Ultra
Flash Attention ON:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 ULTRA | METAL | tiny | 1 | 1 | 7.82 | 1.31 | 0.35 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q5_0 | 1 | 1 | 8.32 | 1.28 | 0.37 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q5_1 | 1 | 1 | 8.21 | 1.28 | 0.37 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q8_0 | 1 | 1 | 7.97 | 1.23 | 0.36 | 0.01 | ad4e350 |
M2 ULTRA | METAL | base | 1 | 1 | 13.96 | 1.80 | 0.42 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q5_0 | 1 | 1 | 15.19 | 1.75 | 0.42 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q5_1 | 1 | 1 | 15.09 | 1.75 | 0.42 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q8_0 | 1 | 1 | 14.45 | 1.70 | 0.41 | 0.02 | ad4e350 |
M2 ULTRA | METAL | small | 1 | 1 | 40.08 | 3.54 | 0.86 | 0.05 | ad4e350 |
M2 ULTRA | METAL | small-q5_0 | 1 | 1 | 45.07 | 3.51 | 0.88 | 0.05 | ad4e350 |
M2 ULTRA | METAL | small-q5_1 | 1 | 1 | 45.05 | 3.52 | 0.88 | 0.05 | ad4e350 |
M2 ULTRA | METAL | small-q8_0 | 1 | 1 | 42.04 | 3.34 | 0.85 | 0.05 | ad4e350 |
M2 ULTRA | METAL | medium | 1 | 1 | 107.20 | 7.28 | 1.79 | 0.11 | ad4e350 |
M2 ULTRA | METAL | medium-q5_0 | 1 | 1 | 125.02 | 6.67 | 1.83 | 0.12 | ad4e350 |
M2 ULTRA | METAL | medium-q5_1 | 1 | 1 | 124.83 | 6.70 | 1.84 | 0.12 | ad4e350 |
M2 ULTRA | METAL | medium-q8_0 | 1 | 1 | 114.56 | 6.53 | 1.79 | 0.11 | ad4e350 |
M2 ULTRA | METAL | medium-dis | 1 | 1 | 95.96 | 1.01 | 0.23 | 0.01 | ad4e350 |
M2 ULTRA | METAL | large-v2 | 1 | 1 | 194.29 | 10.57 | 2.67 | 0.20 | ad4e350 |
M2 ULTRA | METAL | large-v2-q5_0 | 1 | 1 | 230.74 | 9.57 | 2.73 | 0.23 | ad4e350 |
M2 ULTRA | METAL | large-v2-q5_1 | 1 | 1 | 229.97 | 9.69 | 2.74 | 0.23 | ad4e350 |
M2 ULTRA | METAL | large-v2-q8_0 | 1 | 1 | 208.11 | 9.37 | 2.60 | 0.21 | ad4e350 |
M2 ULTRA | METAL | large-v2-dis | 1 | 1 | 172.72 | 1.12 | 0.26 | 0.02 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo | 1 | 1 | 174.46 | 1.74 | 0.42 | 0.03 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo-q5_0 | 1 | 1 | 205.78 | 1.54 | 0.42 | 0.04 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo-q8_0 | 1 | 1 | 186.33 | 1.50 | 0.40 | 0.03 | ad4e350 |
Flash Attention OFF:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 ULTRA | METAL | tiny | 1 | 0 | 8.74 | 1.20 | 0.36 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q5_0 | 1 | 0 | 10.30 | 1.15 | 0.38 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q5_1 | 1 | 0 | 10.71 | 1.13 | 0.38 | 0.01 | ad4e350 |
M2 ULTRA | METAL | tiny-q8_0 | 1 | 0 | 9.97 | 1.12 | 0.37 | 0.01 | ad4e350 |
M2 ULTRA | METAL | base | 1 | 0 | 16.77 | 1.71 | 0.44 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q5_0 | 1 | 0 | 16.92 | 1.63 | 0.44 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q5_1 | 1 | 0 | 16.84 | 1.63 | 0.44 | 0.02 | ad4e350 |
M2 ULTRA | METAL | base-q8_0 | 1 | 0 | 16.12 | 1.63 | 0.44 | 0.02 | ad4e350 |
M2 ULTRA | METAL | small | 1 | 0 | 45.29 | 3.44 | 0.92 | 0.05 | ad4e350 |
M2 ULTRA | METAL | small-q5_0 | 1 | 0 | 50.43 | 3.34 | 0.94 | 0.06 | ad4e350 |
M2 ULTRA | METAL | small-q5_1 | 1 | 0 | 50.49 | 3.35 | 0.93 | 0.06 | ad4e350 |
M2 ULTRA | METAL | small-q8_0 | 1 | 0 | 47.37 | 3.20 | 0.91 | 0.05 | ad4e350 |
M2 ULTRA | METAL | medium | 1 | 0 | 122.81 | 7.39 | 1.99 | 0.12 | ad4e350 |
M2 ULTRA | METAL | medium-q5_0 | 1 | 0 | 140.62 | 6.73 | 2.03 | 0.14 | ad4e350 |
M2 ULTRA | METAL | medium-q5_1 | 1 | 0 | 140.44 | 6.74 | 2.04 | 0.14 | ad4e350 |
M2 ULTRA | METAL | medium-q8_0 | 1 | 0 | 131.05 | 6.54 | 1.95 | 0.13 | ad4e350 |
M2 ULTRA | METAL | medium-dis | 1 | 0 | 110.95 | 0.99 | 0.24 | 0.02 | ad4e350 |
M2 ULTRA | METAL | large-v2 | 1 | 0 | 222.19 | 10.93 | 3.01 | 0.21 | ad4e350 |
M2 ULTRA | METAL | large-v2-q5_0 | 1 | 0 | 258.47 | 9.75 | 3.01 | 0.25 | ad4e350 |
M2 ULTRA | METAL | large-v2-q5_1 | 1 | 0 | 258.40 | 9.85 | 3.01 | 0.24 | ad4e350 |
M2 ULTRA | METAL | large-v2-q8_0 | 1 | 0 | 236.68 | 9.61 | 2.85 | 0.23 | ad4e350 |
M2 ULTRA | METAL | large-v2-dis | 1 | 0 | 199.28 | 1.12 | 0.27 | 0.02 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo | 1 | 0 | 201.49 | 1.76 | 0.45 | 0.03 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo-q5_0 | 1 | 0 | 233.70 | 1.55 | 0.46 | 0.04 | ad4e350 |
M2 ULTRA | METAL | large-v3-turbo-q8_0 | 1 | 0 | 214.20 | 1.51 | 0.44 | 0.04 | ad4e350 |
M4 Max
Flash Attention ON:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M4 Max | METAL | tiny | 1 | 1 | 15.22 | 0.89 | 0.26 | 0.01 | ad4e350 |
M4 Max | METAL | tiny-q8_0 | 1 | 1 | 14.70 | 0.86 | 0.26 | 0.01 | ad4e350 |
M4 Max | METAL | base | 1 | 1 | 25.33 | 1.36 | 0.30 | 0.02 | ad4e350 |
M4 Max | METAL | base-q8_0 | 1 | 1 | 21.27 | 1.31 | 0.30 | 0.02 | ad4e350 |
M4 Max | METAL | small | 1 | 1 | 58.43 | 2.78 | 0.60 | 0.05 | ad4e350 |
M4 Max | METAL | small-q8_0 | 1 | 1 | 60.26 | 2.39 | 0.60 | 0.05 | ad4e350 |
M4 Max | METAL | medium | 1 | 1 | 169.73 | 6.03 | 1.31 | 0.14 | ad4e350 |
M4 Max | METAL | medium-q8_0 | 1 | 1 | 176.61 | 4.99 | 1.31 | 0.14 | ad4e350 |
M4 Max | METAL | large-v2 | 1 | 1 | 316.18 | 9.60 | 2.08 | 0.24 | ad4e350 |
M4 Max | METAL | large-v2-q8_0 | 1 | 1 | 329.59 | 7.55 | 2.08 | 0.25 | ad4e350 |
Flash Attention OFF:
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M4 Max | METAL | tiny | 1 | 0 | 13.12 | 0.87 | 0.29 | 0.01 | ad4e350 |
M4 Max | METAL | tiny-q8_0 | 1 | 0 | 15.90 | 0.88 | 0.31 | 0.01 | ad4e350 |
M4 Max | METAL | base | 1 | 0 | 23.10 | 1.42 | 0.34 | 0.02 | ad4e350 |
M4 Max | METAL | base-q8_0 | 1 | 0 | 27.25 | 1.31 | 0.34 | 0.02 | ad4e350 |
M4 Max | METAL | small | 1 | 0 | 71.76 | 3.02 | 0.70 | 0.06 | ad4e350 |
M4 Max | METAL | small-q8_0 | 1 | 0 | 73.88 | 2.60 | 0.71 | 0.06 | ad4e350 |
M4 Max | METAL | medium | 1 | 0 | 208.22 | 6.94 | 1.55 | 0.16 | ad4e350 |
M4 Max | METAL | medium-q8_0 | 1 | 0 | 214.65 | 5.90 | 1.57 | 0.17 | ad4e350 |
M4 Max | METAL | large-v2 | 1 | 0 | 381.72 | 11.28 | 2.51 | 0.29 | ad4e350 |
M4 Max | METAL | large-v2-q8_0 | 1 | 0 | 394.97 | 8.90 | 2.45 | 0.30 | ad4e350 |
V100
Flash Attention ON:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
V100 | AVX2 CUDA | tiny | 8 | 1 | 4.01 | 0.90 | 0.25 | 0.01 | ad4e350 |
V100 | AVX2 CUDA | tiny-q5_1 | 8 | 1 | 4.12 | 0.88 | 0.18 | 0.01 | ad4e350 |
V100 | AVX2 CUDA | base | 8 | 1 | 7.00 | 1.30 | 0.35 | 0.01 | ad4e350 |
V100 | AVX2 CUDA | base-q5_1 | 8 | 1 | 7.22 | 1.21 | 0.26 | 0.02 | ad4e350 |
V100 | AVX2 CUDA | small | 8 | 1 | 18.68 | 2.39 | 0.69 | 0.03 | ad4e350 |
V100 | AVX2 CUDA | small-q5_1 | 8 | 1 | 19.38 | 2.32 | 0.51 | 0.03 | ad4e350 |
V100 | AVX2 CUDA | medium | 8 | 1 | 53.17 | 5.15 | 1.45 | 0.06 | ad4e350 |
V100 | AVX2 CUDA | medium-q5_0 | 8 | 1 | 55.09 | 4.64 | 1.05 | 0.07 | ad4e350 |
V100 | AVX2 CUDA | large-v2 | 8 | 1 | 85.77 | 7.57 | 2.19 | 0.10 | ad4e350 |
V100 | AVX2 CUDA | large-v2-q5_0 | 8 | 1 | 89.24 | 6.48 | 1.48 | 0.11 | ad4e350 |
V100 | AVX2 CUDA | large-v3-turbo | 8 | 1 | 75.56 | 1.25 | 0.37 | 0.02 | ad4e350 |
V100 | AVX2 CUDA | large-v3-turbo-q5_0 | 8 | 1 | 78.48 | 1.01 | 0.24 | 0.02 | ad4e350 |
Flash Attention OFF:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
V100 | AVX2 CUDA | tiny | 8 | 0 | 6.15 | 1.02 | 0.30 | 0.01 | ad4e350 |
V100 | AVX2 CUDA | tiny-q5_1 | 8 | 0 | 5.92 | 0.96 | 0.25 | 0.01 | ad4e350 |
V100 | AVX2 CUDA | base | 8 | 0 | 10.60 | 1.43 | 0.43 | 0.02 | ad4e350 |
V100 | AVX2 CUDA | base-q5_1 | 8 | 0 | 10.80 | 1.37 | 0.36 | 0.02 | ad4e350 |
V100 | AVX2 CUDA | small | 8 | 0 | 31.83 | 2.82 | 0.87 | 0.04 | ad4e350 |
V100 | AVX2 CUDA | small-q5_1 | 8 | 0 | 31.88 | 2.68 | 0.72 | 0.04 | ad4e350 |
V100 | AVX2 CUDA | medium | 8 | 0 | 81.30 | 6.02 | 1.81 | 0.09 | ad4e350 |
V100 | AVX2 CUDA | medium-q5_0 | 8 | 0 | 83.21 | 5.44 | 1.41 | 0.10 | ad4e350 |
V100 | AVX2 CUDA | large-v2 | 8 | 0 | 134.81 | 8.64 | 2.69 | 0.14 | ad4e350 |
V100 | AVX2 CUDA | large-v2-q5_0 | 8 | 0 | 138.95 | 7.57 | 2.04 | 0.15 | ad4e350 |
V100 | AVX2 CUDA | large-v3-turbo | 8 | 0 | 124.42 | 1.37 | 0.43 | 0.02 | ad4e350 |
V100 | AVX2 CUDA | large-v3-turbo-q5_0 | 8 | 0 | 127.81 | 1.13 | 0.32 | 0.03 | ad4e350 |
What's Changed
- docs: Fix main -> whisper-cli in download scripts by @domdomegg in #2707
- Adding back docker instructions in v1.7.4 by @jayant-yadav in #2711
- Expose whisper_full_get_segment_no_speech_prob_from_state by @sandrohanea in #2716
- Use Unique Filenames for FFmpeg Conversion to Prevent File Overwrites by @NETZkultur in #2718
- talk-llama : sync llama.cpp by @ggerganov in #2709
- whisper : fix gpu device selection by @ggerganov in #2728
- sync : ggml by @ggerganov in #2737
- Fix whisper.objc Xcode project by @iamcgn in #2736
- ruby : Make context accept initial parameters, API to retrieve a segment and more by @KitaitiMakoto in #2749
- sync : ggml by @ggerganov in #2779
- Fix coreml export script by @mgrachten in #2770
- Add max_len param in node addon by @billyct in #2760
- cmake : fix compile assumptions for power9/etc by @midnightmagic in #2777
- Fixes for Windows by @foldl in #2790
- Restore big endian support by @fitzsim in #2816
- Add beam size parameter to stream example by @masahji in #2836
- sync : ggml by @ggerganov in #2844
- Adjusted stream example to stop on ^C when no audio is received. by @petterreinholdtsen in #2822
- Use miniaudio for direct decoding flac, mp3, ogg and wav by @data-man in #2759
- common : try to fix build by @ggerganov in #2845
- common : separate whisper sources by @ggerganov in #2846
- whisper : support GGML_BACKEND_DL by @slaren in #2843
- ruby : follow audio library change by @KitaitiMakoto in #2851
- fix: missing include common-whisper in addon.node by @buxuku in #2858
- Fixes audio loading by miniaudio by @data-man in #2862
- Updated models download URL by @AMDphreak in #2756
- [Fix] When m_audio_pos overflows, m_audio_len should also increase in… by @Ivy233 in #2855
- sync : ggml by @ggerganov in #2868
- whisper: add xcframework build script by @mdestagnol in #2873
- examples : add dl to the list of libraries linked by @danbev in #2875
- ggml-ci: add run.sh by @redraskal in #2877
- ggml-ci: update input env variables to GG_BUILD_ by @redraskal in #2879
- examples : add GGML_USE_CPU=ON flag to whisper.objc by @danbev in #2880
- Update convert-h5-to-ggml.py by @fltman in #2840
- whisper : add option to use system-installed GGML by @peter277 in #2887
- examples : use xcframework in whisper.objc example by @danbev in #2882
- ci : add release job and include xcframework by @danbev in #2889
- whisper : enable compiler warnings for src by @danbev in #2891
- ci : add missing env.branch_name to build.yml by @danbev in #2896
- whisper : fix compiler warnings in whisper.cpp by @danbev in #2895
- ci : add ccache action to windows-cublas job by @danbev in #2893
- Implementing Encoder Begin Callback for golang binding by @aderbedr in #2900
- ci : refactor cuda toolkit installation steps by @danbev in #2902
- examples : command.wasm updates by @danbev in #2904
- examples : update wasm examples to include server.py [no ci] by @danbev in #2908
- ci : use ninja and fix caching for windows-cublas by @danbev in #2910
- examples : add WHISPER_SDL2 check to deprecation executables by @danbev in #2911
- xcframework : add support for CoreML to ios/macOS by @danbev in #2912
- ci : increase windows-cublas evict-old-files to 5d by @danbev in #2915
- examples : update whisper.objc README.md by @danbev in #2916
- whisper : add check for CPU backend initialization by @danbev in #2918
- readme : update Python version to 3.11 for Core ML support [no -ci] by @danbev in #2919
- whisper.swiftui : Add Core ML support to README [no ci] by @danbev in #2921
- whisper : update default model download directory behavior to use current working directory when script is in /bin/ directory by @peter277 in #2924
- ci : remove CMAKE_CUDA_ARCHITECTURES in windows-cublas by @danbev in #2923
- whisper : initialize decoder's rng with unique seed by @danbev in #2932
- whisper : enhance model download scripts functionality and resolve compiler warning by @peter277 in #2925
- ggml : add logging for native build options/vars by @danbev in #2935
- examples : fix request path for local worker files by @danbev in #2937
- examples : fix nthread parsing in whisper.wasm by @danbev in #2938
- examples : reduce initial memory to 512MB by @danbev in #2939
- ci: fix SYCL build by @qnixsynapse in #2943
- whisper.android.java : update build with ggml source changes by @danbev in #2942
- whisper.android : add GGML_USE_CPU compile definition by @danbev in #2945
- Update README.md by @Page-MS in #2946
- bindings.javascript : update test instructions [no ci] by @danbev in #2951
- bindings.java : enable copyLibs task [no ci] by @danbev in #2949
- whisper : add support for backends with multiple ggml_backend_buffer_type by @eddnjjn in #2863
- bindings-go : update Makefile to use cmake by @danbev in #2952
- sync : ggml by @ggerganov in #2953
- support progress_callback API for addon.node by @buxuku in #2941
- bindings.ruby : fix test failures in test_whisper by @danbev in #2955
- Adding in DetectedLanguage to go bindings by @aderbedr in #2947
- sync : ggml by @ggerganov in #2962
- whisper : remove unnecessary GGML_UNUSED macro by @danbev in #2960
- feat: add health check endpoint to server by @sachaarbonel in #2968
- ci : add github pages workflow for wasm examples by @danbev in #2969
- examples : update README links to point to pages deployment by @danbev in #2971
- cmake: improve Vulkan cooperative matrix support checks by @sandrohanea in #2966
- sync : ggml by @ggerganov in #2972
- ci : re-enable android_java job by @danbev in #2958
- ci : re-enable freeBDS-latest job by @danbev in #2973
- android.java : re-add ggml source updates by @danbev in #2975
- tests : re-enable tests [no ci] by @danbev in #2977
- ci : add coreml job that converts base.en to coreml [no ci] by @danbev in #2981
- coreml: fix Whisper to CoreML conversion by disabling SDPA [no ci] by @danbev in #2979
- whisper.objc : fix typo in README.md [no ci] by @danbev in #2985
- ci : remove intermediate build on push to master by @danbev in #2986
- examples : clarify Core ML encoder model usage [no ci] by @danbev in #2987
- tests : remove gh label test-whisper-cli-tiny-en by @danbev in #2988
- sync : ggml by @ggerganov in #2991
- bench : update numbers [no ci] by @ggerganov in #2993
- release : v1.7.5 by @ggerganov in #2994
New Contributors
- @domdomegg made their first contribution in #2707
- @jayant-yadav made their first contribution in #2711
- @NETZkultur made their first contribution in #2718
- @iamcgn made their first contribution in #2736
- @mgrachten made their first contribution in #2770
- @billyct made their first contribution in #2760
- @midnightmagic made their first contribution in #2777
- @foldl made their first contribution in #2790
- @masahji made their first contribution in #2836
- @data-man made their first contribution in #2759
- @buxuku made their first contribution in #2858
- @AMDphreak made their first contribution in #2756
- @Ivy233 made their first contribution in #2855
- @mdestagnol made their first contribution in #2873
- @danbev made their first contribution in #2875
- @redraskal made their first contribution in #2877
- @fltman made their first contribution in #2840
- @peter277 made their first contribution in #2887
- @aderbedr made their first contribution in #2900
- @qnixsynapse made their first contribution in #2943
- @Page-MS made their first contribution in #2946
- @eddnjjn made their first contribution in #2863
Full Changelog: v1.7.4...v1.7.5