github ggml-org/whisper.cpp v1.7.5

2 days ago

Overview

This is a relatively big update with various build and CI improvements especially for iOS and WASM. There are also some performance gains, especially for the Metal backend and probably for Arm-based devices.

Big shoutout to @danbev for stepping up and completing the maintenance roadmap for this release!

Mobile examples

All mobile examples have been refreshed. The iOS examples specifically are now much easier to build thanks to the new XCFramework workflow. This should simplify significantly integration of whisper.cpp in 3rd party iOS and macOS apps. CoreML build and convert instructions have also been updated.

WASM examples

The WASM examples are now automatically updated on each new commit and hosted in Github Pages at https://ggerganov.github.io/whisper.cpp/. Problems with CORS rules should be resolved.


Some performance numbers for this release:

M2 Ultra

Flash Attention ON:

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 ULTRA METAL tiny 1 1 7.82 1.31 0.35 0.01 ad4e350
M2 ULTRA METAL tiny-q5_0 1 1 8.32 1.28 0.37 0.01 ad4e350
M2 ULTRA METAL tiny-q5_1 1 1 8.21 1.28 0.37 0.01 ad4e350
M2 ULTRA METAL tiny-q8_0 1 1 7.97 1.23 0.36 0.01 ad4e350
M2 ULTRA METAL base 1 1 13.96 1.80 0.42 0.02 ad4e350
M2 ULTRA METAL base-q5_0 1 1 15.19 1.75 0.42 0.02 ad4e350
M2 ULTRA METAL base-q5_1 1 1 15.09 1.75 0.42 0.02 ad4e350
M2 ULTRA METAL base-q8_0 1 1 14.45 1.70 0.41 0.02 ad4e350
M2 ULTRA METAL small 1 1 40.08 3.54 0.86 0.05 ad4e350
M2 ULTRA METAL small-q5_0 1 1 45.07 3.51 0.88 0.05 ad4e350
M2 ULTRA METAL small-q5_1 1 1 45.05 3.52 0.88 0.05 ad4e350
M2 ULTRA METAL small-q8_0 1 1 42.04 3.34 0.85 0.05 ad4e350
M2 ULTRA METAL medium 1 1 107.20 7.28 1.79 0.11 ad4e350
M2 ULTRA METAL medium-q5_0 1 1 125.02 6.67 1.83 0.12 ad4e350
M2 ULTRA METAL medium-q5_1 1 1 124.83 6.70 1.84 0.12 ad4e350
M2 ULTRA METAL medium-q8_0 1 1 114.56 6.53 1.79 0.11 ad4e350
M2 ULTRA METAL medium-dis 1 1 95.96 1.01 0.23 0.01 ad4e350
M2 ULTRA METAL large-v2 1 1 194.29 10.57 2.67 0.20 ad4e350
M2 ULTRA METAL large-v2-q5_0 1 1 230.74 9.57 2.73 0.23 ad4e350
M2 ULTRA METAL large-v2-q5_1 1 1 229.97 9.69 2.74 0.23 ad4e350
M2 ULTRA METAL large-v2-q8_0 1 1 208.11 9.37 2.60 0.21 ad4e350
M2 ULTRA METAL large-v2-dis 1 1 172.72 1.12 0.26 0.02 ad4e350
M2 ULTRA METAL large-v3-turbo 1 1 174.46 1.74 0.42 0.03 ad4e350
M2 ULTRA METAL large-v3-turbo-q5_0 1 1 205.78 1.54 0.42 0.04 ad4e350
M2 ULTRA METAL large-v3-turbo-q8_0 1 1 186.33 1.50 0.40 0.03 ad4e350

Flash Attention OFF:

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 ULTRA METAL tiny 1 0 8.74 1.20 0.36 0.01 ad4e350
M2 ULTRA METAL tiny-q5_0 1 0 10.30 1.15 0.38 0.01 ad4e350
M2 ULTRA METAL tiny-q5_1 1 0 10.71 1.13 0.38 0.01 ad4e350
M2 ULTRA METAL tiny-q8_0 1 0 9.97 1.12 0.37 0.01 ad4e350
M2 ULTRA METAL base 1 0 16.77 1.71 0.44 0.02 ad4e350
M2 ULTRA METAL base-q5_0 1 0 16.92 1.63 0.44 0.02 ad4e350
M2 ULTRA METAL base-q5_1 1 0 16.84 1.63 0.44 0.02 ad4e350
M2 ULTRA METAL base-q8_0 1 0 16.12 1.63 0.44 0.02 ad4e350
M2 ULTRA METAL small 1 0 45.29 3.44 0.92 0.05 ad4e350
M2 ULTRA METAL small-q5_0 1 0 50.43 3.34 0.94 0.06 ad4e350
M2 ULTRA METAL small-q5_1 1 0 50.49 3.35 0.93 0.06 ad4e350
M2 ULTRA METAL small-q8_0 1 0 47.37 3.20 0.91 0.05 ad4e350
M2 ULTRA METAL medium 1 0 122.81 7.39 1.99 0.12 ad4e350
M2 ULTRA METAL medium-q5_0 1 0 140.62 6.73 2.03 0.14 ad4e350
M2 ULTRA METAL medium-q5_1 1 0 140.44 6.74 2.04 0.14 ad4e350
M2 ULTRA METAL medium-q8_0 1 0 131.05 6.54 1.95 0.13 ad4e350
M2 ULTRA METAL medium-dis 1 0 110.95 0.99 0.24 0.02 ad4e350
M2 ULTRA METAL large-v2 1 0 222.19 10.93 3.01 0.21 ad4e350
M2 ULTRA METAL large-v2-q5_0 1 0 258.47 9.75 3.01 0.25 ad4e350
M2 ULTRA METAL large-v2-q5_1 1 0 258.40 9.85 3.01 0.24 ad4e350
M2 ULTRA METAL large-v2-q8_0 1 0 236.68 9.61 2.85 0.23 ad4e350
M2 ULTRA METAL large-v2-dis 1 0 199.28 1.12 0.27 0.02 ad4e350
M2 ULTRA METAL large-v3-turbo 1 0 201.49 1.76 0.45 0.03 ad4e350
M2 ULTRA METAL large-v3-turbo-q5_0 1 0 233.70 1.55 0.46 0.04 ad4e350
M2 ULTRA METAL large-v3-turbo-q8_0 1 0 214.20 1.51 0.44 0.04 ad4e350

M4 Max

Flash Attention ON:

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M4 Max METAL tiny 1 1 15.22 0.89 0.26 0.01 ad4e350
M4 Max METAL tiny-q8_0 1 1 14.70 0.86 0.26 0.01 ad4e350
M4 Max METAL base 1 1 25.33 1.36 0.30 0.02 ad4e350
M4 Max METAL base-q8_0 1 1 21.27 1.31 0.30 0.02 ad4e350
M4 Max METAL small 1 1 58.43 2.78 0.60 0.05 ad4e350
M4 Max METAL small-q8_0 1 1 60.26 2.39 0.60 0.05 ad4e350
M4 Max METAL medium 1 1 169.73 6.03 1.31 0.14 ad4e350
M4 Max METAL medium-q8_0 1 1 176.61 4.99 1.31 0.14 ad4e350
M4 Max METAL large-v2 1 1 316.18 9.60 2.08 0.24 ad4e350
M4 Max METAL large-v2-q8_0 1 1 329.59 7.55 2.08 0.25 ad4e350

Flash Attention OFF:

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M4 Max METAL tiny 1 0 13.12 0.87 0.29 0.01 ad4e350
M4 Max METAL tiny-q8_0 1 0 15.90 0.88 0.31 0.01 ad4e350
M4 Max METAL base 1 0 23.10 1.42 0.34 0.02 ad4e350
M4 Max METAL base-q8_0 1 0 27.25 1.31 0.34 0.02 ad4e350
M4 Max METAL small 1 0 71.76 3.02 0.70 0.06 ad4e350
M4 Max METAL small-q8_0 1 0 73.88 2.60 0.71 0.06 ad4e350
M4 Max METAL medium 1 0 208.22 6.94 1.55 0.16 ad4e350
M4 Max METAL medium-q8_0 1 0 214.65 5.90 1.57 0.17 ad4e350
M4 Max METAL large-v2 1 0 381.72 11.28 2.51 0.29 ad4e350
M4 Max METAL large-v2-q8_0 1 0 394.97 8.90 2.45 0.30 ad4e350

V100

Flash Attention ON:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
V100 AVX2 CUDA tiny 8 1 4.01 0.90 0.25 0.01 ad4e350
V100 AVX2 CUDA tiny-q5_1 8 1 4.12 0.88 0.18 0.01 ad4e350
V100 AVX2 CUDA base 8 1 7.00 1.30 0.35 0.01 ad4e350
V100 AVX2 CUDA base-q5_1 8 1 7.22 1.21 0.26 0.02 ad4e350
V100 AVX2 CUDA small 8 1 18.68 2.39 0.69 0.03 ad4e350
V100 AVX2 CUDA small-q5_1 8 1 19.38 2.32 0.51 0.03 ad4e350
V100 AVX2 CUDA medium 8 1 53.17 5.15 1.45 0.06 ad4e350
V100 AVX2 CUDA medium-q5_0 8 1 55.09 4.64 1.05 0.07 ad4e350
V100 AVX2 CUDA large-v2 8 1 85.77 7.57 2.19 0.10 ad4e350
V100 AVX2 CUDA large-v2-q5_0 8 1 89.24 6.48 1.48 0.11 ad4e350
V100 AVX2 CUDA large-v3-turbo 8 1 75.56 1.25 0.37 0.02 ad4e350
V100 AVX2 CUDA large-v3-turbo-q5_0 8 1 78.48 1.01 0.24 0.02 ad4e350

Flash Attention OFF:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
V100 AVX2 CUDA tiny 8 0 6.15 1.02 0.30 0.01 ad4e350
V100 AVX2 CUDA tiny-q5_1 8 0 5.92 0.96 0.25 0.01 ad4e350
V100 AVX2 CUDA base 8 0 10.60 1.43 0.43 0.02 ad4e350
V100 AVX2 CUDA base-q5_1 8 0 10.80 1.37 0.36 0.02 ad4e350
V100 AVX2 CUDA small 8 0 31.83 2.82 0.87 0.04 ad4e350
V100 AVX2 CUDA small-q5_1 8 0 31.88 2.68 0.72 0.04 ad4e350
V100 AVX2 CUDA medium 8 0 81.30 6.02 1.81 0.09 ad4e350
V100 AVX2 CUDA medium-q5_0 8 0 83.21 5.44 1.41 0.10 ad4e350
V100 AVX2 CUDA large-v2 8 0 134.81 8.64 2.69 0.14 ad4e350
V100 AVX2 CUDA large-v2-q5_0 8 0 138.95 7.57 2.04 0.15 ad4e350
V100 AVX2 CUDA large-v3-turbo 8 0 124.42 1.37 0.43 0.02 ad4e350
V100 AVX2 CUDA large-v3-turbo-q5_0 8 0 127.81 1.13 0.32 0.03 ad4e350

What's Changed

New Contributors

Full Changelog: v1.7.4...v1.7.5

Don't miss a new whisper.cpp release

NewReleases is sending notifications on new releases.