ggml-org/whisper.cpp v1.7.5 on GitHub

Overview

This is a relatively big update with various build and CI improvements especially for iOS and WASM. There are also some performance gains, especially for the Metal backend and probably for Arm-based devices.

Big shoutout to @danbev for stepping up and completing the maintenance roadmap for this release!

Mobile examples

All mobile examples have been refreshed. The iOS examples specifically are now much easier to build thanks to the new XCFramework workflow. This should simplify significantly integration of whisper.cpp in 3rd party iOS and macOS apps. CoreML build and convert instructions have also been updated.

WASM examples

The WASM examples are now automatically updated on each new commit and hosted in Github Pages at https://ggerganov.github.io/whisper.cpp/. Problems with CORS rules should be resolved.

Some performance numbers for this release:

M2 Ultra

Flash Attention ON:

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M2 ULTRA	METAL	tiny	1	1	7.82	1.31	0.35	0.01	`ad4e350`
M2 ULTRA	METAL	tiny-q5_0	1	1	8.32	1.28	0.37	0.01	`ad4e350`
M2 ULTRA	METAL	tiny-q5_1	1	1	8.21	1.28	0.37	0.01	`ad4e350`
M2 ULTRA	METAL	tiny-q8_0	1	1	7.97	1.23	0.36	0.01	`ad4e350`
M2 ULTRA	METAL	base	1	1	13.96	1.80	0.42	0.02	`ad4e350`
M2 ULTRA	METAL	base-q5_0	1	1	15.19	1.75	0.42	0.02	`ad4e350`
M2 ULTRA	METAL	base-q5_1	1	1	15.09	1.75	0.42	0.02	`ad4e350`
M2 ULTRA	METAL	base-q8_0	1	1	14.45	1.70	0.41	0.02	`ad4e350`
M2 ULTRA	METAL	small	1	1	40.08	3.54	0.86	0.05	`ad4e350`
M2 ULTRA	METAL	small-q5_0	1	1	45.07	3.51	0.88	0.05	`ad4e350`
M2 ULTRA	METAL	small-q5_1	1	1	45.05	3.52	0.88	0.05	`ad4e350`
M2 ULTRA	METAL	small-q8_0	1	1	42.04	3.34	0.85	0.05	`ad4e350`
M2 ULTRA	METAL	medium	1	1	107.20	7.28	1.79	0.11	`ad4e350`
M2 ULTRA	METAL	medium-q5_0	1	1	125.02	6.67	1.83	0.12	`ad4e350`
M2 ULTRA	METAL	medium-q5_1	1	1	124.83	6.70	1.84	0.12	`ad4e350`
M2 ULTRA	METAL	medium-q8_0	1	1	114.56	6.53	1.79	0.11	`ad4e350`
M2 ULTRA	METAL	medium-dis	1	1	95.96	1.01	0.23	0.01	`ad4e350`
M2 ULTRA	METAL	large-v2	1	1	194.29	10.57	2.67	0.20	`ad4e350`
M2 ULTRA	METAL	large-v2-q5_0	1	1	230.74	9.57	2.73	0.23	`ad4e350`
M2 ULTRA	METAL	large-v2-q5_1	1	1	229.97	9.69	2.74	0.23	`ad4e350`
M2 ULTRA	METAL	large-v2-q8_0	1	1	208.11	9.37	2.60	0.21	`ad4e350`
M2 ULTRA	METAL	large-v2-dis	1	1	172.72	1.12	0.26	0.02	`ad4e350`
M2 ULTRA	METAL	large-v3-turbo	1	1	174.46	1.74	0.42	0.03	`ad4e350`
M2 ULTRA	METAL	large-v3-turbo-q5_0	1	1	205.78	1.54	0.42	0.04	`ad4e350`
M2 ULTRA	METAL	large-v3-turbo-q8_0	1	1	186.33	1.50	0.40	0.03	`ad4e350`

Flash Attention OFF:

CPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
M2 ULTRA	METAL	tiny	1	8.74	1.20	0.36	0.01	`ad4e350`
M2 ULTRA	METAL	tiny-q5_0	1	10.30	1.15	0.38	0.01	`ad4e350`
M2 ULTRA	METAL	tiny-q5_1	1	10.71	1.13	0.38	0.01	`ad4e350`
M2 ULTRA	METAL	tiny-q8_0	1	9.97	1.12	0.37	0.01	`ad4e350`
M2 ULTRA	METAL	base	1	16.77	1.71	0.44	0.02	`ad4e350`
M2 ULTRA	METAL	base-q5_0	1	16.92	1.63	0.44	0.02	`ad4e350`
M2 ULTRA	METAL	base-q5_1	1	16.84	1.63	0.44	0.02	`ad4e350`
M2 ULTRA	METAL	base-q8_0	1	16.12	1.63	0.44	0.02	`ad4e350`
M2 ULTRA	METAL	small	1	45.29	3.44	0.92	0.05	`ad4e350`
M2 ULTRA	METAL	small-q5_0	1	50.43	3.34	0.94	0.06	`ad4e350`
M2 ULTRA	METAL	small-q5_1	1	50.49	3.35	0.93	0.06	`ad4e350`
M2 ULTRA	METAL	small-q8_0	1	47.37	3.20	0.91	0.05	`ad4e350`
M2 ULTRA	METAL	medium	1	122.81	7.39	1.99	0.12	`ad4e350`
M2 ULTRA	METAL	medium-q5_0	1	140.62	6.73	2.03	0.14	`ad4e350`
M2 ULTRA	METAL	medium-q5_1	1	140.44	6.74	2.04	0.14	`ad4e350`
M2 ULTRA	METAL	medium-q8_0	1	131.05	6.54	1.95	0.13	`ad4e350`
M2 ULTRA	METAL	medium-dis	1	110.95	0.99	0.24	0.02	`ad4e350`
M2 ULTRA	METAL	large-v2	1	222.19	10.93	3.01	0.21	`ad4e350`
M2 ULTRA	METAL	large-v2-q5_0	1	258.47	9.75	3.01	0.25	`ad4e350`
M2 ULTRA	METAL	large-v2-q5_1	1	258.40	9.85	3.01	0.24	`ad4e350`
M2 ULTRA	METAL	large-v2-q8_0	1	236.68	9.61	2.85	0.23	`ad4e350`
M2 ULTRA	METAL	large-v2-dis	1	199.28	1.12	0.27	0.02	`ad4e350`
M2 ULTRA	METAL	large-v3-turbo	1	201.49	1.76	0.45	0.03	`ad4e350`
M2 ULTRA	METAL	large-v3-turbo-q5_0	1	233.70	1.55	0.46	0.04	`ad4e350`
M2 ULTRA	METAL	large-v3-turbo-q8_0	1	214.20	1.51	0.44	0.04	`ad4e350`

M4 Max

Flash Attention ON:

CPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
M4 Max	METAL	tiny	1	1	15.22	0.89	0.26	0.01	`ad4e350`
M4 Max	METAL	tiny-q8_0	1	1	14.70	0.86	0.26	0.01	`ad4e350`
M4 Max	METAL	base	1	1	25.33	1.36	0.30	0.02	`ad4e350`
M4 Max	METAL	base-q8_0	1	1	21.27	1.31	0.30	0.02	`ad4e350`
M4 Max	METAL	small	1	1	58.43	2.78	0.60	0.05	`ad4e350`
M4 Max	METAL	small-q8_0	1	1	60.26	2.39	0.60	0.05	`ad4e350`
M4 Max	METAL	medium	1	1	169.73	6.03	1.31	0.14	`ad4e350`
M4 Max	METAL	medium-q8_0	1	1	176.61	4.99	1.31	0.14	`ad4e350`
M4 Max	METAL	large-v2	1	1	316.18	9.60	2.08	0.24	`ad4e350`
M4 Max	METAL	large-v2-q8_0	1	1	329.59	7.55	2.08	0.25	`ad4e350`

Flash Attention OFF:

CPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
M4 Max	METAL	tiny	1	13.12	0.87	0.29	0.01	`ad4e350`
M4 Max	METAL	tiny-q8_0	1	15.90	0.88	0.31	0.01	`ad4e350`
M4 Max	METAL	base	1	23.10	1.42	0.34	0.02	`ad4e350`
M4 Max	METAL	base-q8_0	1	27.25	1.31	0.34	0.02	`ad4e350`
M4 Max	METAL	small	1	71.76	3.02	0.70	0.06	`ad4e350`
M4 Max	METAL	small-q8_0	1	73.88	2.60	0.71	0.06	`ad4e350`
M4 Max	METAL	medium	1	208.22	6.94	1.55	0.16	`ad4e350`
M4 Max	METAL	medium-q8_0	1	214.65	5.90	1.57	0.17	`ad4e350`
M4 Max	METAL	large-v2	1	381.72	11.28	2.51	0.29	`ad4e350`
M4 Max	METAL	large-v2-q8_0	1	394.97	8.90	2.45	0.30	`ad4e350`

V100

Flash Attention ON:

GPU	Config	Model	Th	FA	Enc.	Dec.	Bch5	PP	Commit
V100	AVX2 CUDA	tiny	8	1	4.01	0.90	0.25	0.01	`ad4e350`
V100	AVX2 CUDA	tiny-q5_1	8	1	4.12	0.88	0.18	0.01	`ad4e350`
V100	AVX2 CUDA	base	8	1	7.00	1.30	0.35	0.01	`ad4e350`
V100	AVX2 CUDA	base-q5_1	8	1	7.22	1.21	0.26	0.02	`ad4e350`
V100	AVX2 CUDA	small	8	1	18.68	2.39	0.69	0.03	`ad4e350`
V100	AVX2 CUDA	small-q5_1	8	1	19.38	2.32	0.51	0.03	`ad4e350`
V100	AVX2 CUDA	medium	8	1	53.17	5.15	1.45	0.06	`ad4e350`
V100	AVX2 CUDA	medium-q5_0	8	1	55.09	4.64	1.05	0.07	`ad4e350`
V100	AVX2 CUDA	large-v2	8	1	85.77	7.57	2.19	0.10	`ad4e350`
V100	AVX2 CUDA	large-v2-q5_0	8	1	89.24	6.48	1.48	0.11	`ad4e350`
V100	AVX2 CUDA	large-v3-turbo	8	1	75.56	1.25	0.37	0.02	`ad4e350`
V100	AVX2 CUDA	large-v3-turbo-q5_0	8	1	78.48	1.01	0.24	0.02	`ad4e350`

Flash Attention OFF:

GPU	Config	Model	Th	Enc.	Dec.	Bch5	PP	Commit
V100	AVX2 CUDA	tiny	8	6.15	1.02	0.30	0.01	`ad4e350`
V100	AVX2 CUDA	tiny-q5_1	8	5.92	0.96	0.25	0.01	`ad4e350`
V100	AVX2 CUDA	base	8	10.60	1.43	0.43	0.02	`ad4e350`
V100	AVX2 CUDA	base-q5_1	8	10.80	1.37	0.36	0.02	`ad4e350`
V100	AVX2 CUDA	small	8	31.83	2.82	0.87	0.04	`ad4e350`
V100	AVX2 CUDA	small-q5_1	8	31.88	2.68	0.72	0.04	`ad4e350`
V100	AVX2 CUDA	medium	8	81.30	6.02	1.81	0.09	`ad4e350`
V100	AVX2 CUDA	medium-q5_0	8	83.21	5.44	1.41	0.10	`ad4e350`
V100	AVX2 CUDA	large-v2	8	134.81	8.64	2.69	0.14	`ad4e350`
V100	AVX2 CUDA	large-v2-q5_0	8	138.95	7.57	2.04	0.15	`ad4e350`
V100	AVX2 CUDA	large-v3-turbo	8	124.42	1.37	0.43	0.02	`ad4e350`
V100	AVX2 CUDA	large-v3-turbo-q5_0	8	127.81	1.13	0.32	0.03	`ad4e350`

What's Changed

docs: Fix main -> whisper-cli in download scripts by @domdomegg in #2707
Adding back docker instructions in v1.7.4 by @jayant-yadav in #2711
Expose whisper_full_get_segment_no_speech_prob_from_state by @sandrohanea in #2716
Use Unique Filenames for FFmpeg Conversion to Prevent File Overwrites by @NETZkultur in #2718
talk-llama : sync llama.cpp by @ggerganov in #2709
whisper : fix gpu device selection by @ggerganov in #2728
sync : ggml by @ggerganov in #2737
Fix whisper.objc Xcode project by @iamcgn in #2736
ruby : Make context accept initial parameters, API to retrieve a segment and more by @KitaitiMakoto in #2749
sync : ggml by @ggerganov in #2779
Fix coreml export script by @mgrachten in #2770
Add max_len param in node addon by @billyct in #2760
cmake : fix compile assumptions for power9/etc by @midnightmagic in #2777
Fixes for Windows by @foldl in #2790
Restore big endian support by @fitzsim in #2816
Add beam size parameter to stream example by @masahji in #2836
sync : ggml by @ggerganov in #2844
Adjusted stream example to stop on ^C when no audio is received. by @petterreinholdtsen in #2822
Use miniaudio for direct decoding flac, mp3, ogg and wav by @data-man in #2759
common : try to fix build by @ggerganov in #2845
common : separate whisper sources by @ggerganov in #2846
whisper : support GGML_BACKEND_DL by @slaren in #2843
ruby : follow audio library change by @KitaitiMakoto in #2851
fix: missing include common-whisper in addon.node by @buxuku in #2858
Fixes audio loading by miniaudio by @data-man in #2862
Updated models download URL by @AMDphreak in #2756
[Fix] When m_audio_pos overflows, m_audio_len should also increase in… by @Ivy233 in #2855
sync : ggml by @ggerganov in #2868
whisper: add xcframework build script by @mdestagnol in #2873
examples : add dl to the list of libraries linked by @danbev in #2875
ggml-ci: add run.sh by @redraskal in #2877
ggml-ci: update input env variables to GG_BUILD_ by @redraskal in #2879
examples : add GGML_USE_CPU=ON flag to whisper.objc by @danbev in #2880
Update convert-h5-to-ggml.py by @fltman in #2840
whisper : add option to use system-installed GGML by @peter277 in #2887
examples : use xcframework in whisper.objc example by @danbev in #2882
ci : add release job and include xcframework by @danbev in #2889
whisper : enable compiler warnings for src by @danbev in #2891
ci : add missing env.branch_name to build.yml by @danbev in #2896
whisper : fix compiler warnings in whisper.cpp by @danbev in #2895
ci : add ccache action to windows-cublas job by @danbev in #2893
Implementing Encoder Begin Callback for golang binding by @aderbedr in #2900
ci : refactor cuda toolkit installation steps by @danbev in #2902
examples : command.wasm updates by @danbev in #2904
examples : update wasm examples to include server.py [no ci] by @danbev in #2908
ci : use ninja and fix caching for windows-cublas by @danbev in #2910
examples : add WHISPER_SDL2 check to deprecation executables by @danbev in #2911
xcframework : add support for CoreML to ios/macOS by @danbev in #2912
ci : increase windows-cublas evict-old-files to 5d by @danbev in #2915
examples : update whisper.objc README.md by @danbev in #2916
whisper : add check for CPU backend initialization by @danbev in #2918
readme : update Python version to 3.11 for Core ML support [no -ci] by @danbev in #2919
whisper.swiftui : Add Core ML support to README [no ci] by @danbev in #2921
whisper : update default model download directory behavior to use current working directory when script is in /bin/ directory by @peter277 in #2924
ci : remove CMAKE_CUDA_ARCHITECTURES in windows-cublas by @danbev in #2923
whisper : initialize decoder's rng with unique seed by @danbev in #2932
whisper : enhance model download scripts functionality and resolve compiler warning by @peter277 in #2925
ggml : add logging for native build options/vars by @danbev in #2935
examples : fix request path for local worker files by @danbev in #2937
examples : fix nthread parsing in whisper.wasm by @danbev in #2938
examples : reduce initial memory to 512MB by @danbev in #2939
ci: fix SYCL build by @qnixsynapse in #2943
whisper.android.java : update build with ggml source changes by @danbev in #2942
whisper.android : add GGML_USE_CPU compile definition by @danbev in #2945
Update README.md by @Page-MS in #2946
bindings.javascript : update test instructions [no ci] by @danbev in #2951
bindings.java : enable copyLibs task [no ci] by @danbev in #2949
whisper : add support for backends with multiple ggml_backend_buffer_type by @eddnjjn in #2863
bindings-go : update Makefile to use cmake by @danbev in #2952
sync : ggml by @ggerganov in #2953
support progress_callback API for addon.node by @buxuku in #2941
bindings.ruby : fix test failures in test_whisper by @danbev in #2955
Adding in DetectedLanguage to go bindings by @aderbedr in #2947
sync : ggml by @ggerganov in #2962
whisper : remove unnecessary GGML_UNUSED macro by @danbev in #2960
feat: add health check endpoint to server by @sachaarbonel in #2968
ci : add github pages workflow for wasm examples by @danbev in #2969
examples : update README links to point to pages deployment by @danbev in #2971
cmake: improve Vulkan cooperative matrix support checks by @sandrohanea in #2966
sync : ggml by @ggerganov in #2972
ci : re-enable android_java job by @danbev in #2958
ci : re-enable freeBDS-latest job by @danbev in #2973
android.java : re-add ggml source updates by @danbev in #2975
tests : re-enable tests [no ci] by @danbev in #2977
ci : add coreml job that converts base.en to coreml [no ci] by @danbev in #2981
coreml: fix Whisper to CoreML conversion by disabling SDPA [no ci] by @danbev in #2979
whisper.objc : fix typo in README.md [no ci] by @danbev in #2985
ci : remove intermediate build on push to master by @danbev in #2986
examples : clarify Core ML encoder model usage [no ci] by @danbev in #2987
tests : remove gh label test-whisper-cli-tiny-en by @danbev in #2988
sync : ggml by @ggerganov in #2991
bench : update numbers [no ci] by @ggerganov in #2993
release : v1.7.5 by @ggerganov in #2994

New Contributors

@domdomegg made their first contribution in #2707
@jayant-yadav made their first contribution in #2711
@NETZkultur made their first contribution in #2718
@iamcgn made their first contribution in #2736
@mgrachten made their first contribution in #2770
@billyct made their first contribution in #2760
@midnightmagic made their first contribution in #2777
@foldl made their first contribution in #2790
@masahji made their first contribution in #2836
@data-man made their first contribution in #2759
@buxuku made their first contribution in #2858
@AMDphreak made their first contribution in #2756
@Ivy233 made their first contribution in #2855
@mdestagnol made their first contribution in #2873
@danbev made their first contribution in #2875
@redraskal made their first contribution in #2877
@fltman made their first contribution in #2840
@peter277 made their first contribution in #2887
@aderbedr made their first contribution in #2900
@qnixsynapse made their first contribution in #2943
@Page-MS made their first contribution in #2946
@eddnjjn made their first contribution in #2863

Full Changelog: v1.7.4...v1.7.5