github ggerganov/whisper.cpp v1.7.0

12 hours ago

Overview

  • Fix crashes with high number of beams
  • Reduce overal VRAM usage
  • Optimize Encoder performance

Some performance numbers for this release:

M2 Ultra

Flash Attention ON:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra METAL tiny 1 1 8.37 1.44 0.48 0.01 6a94163
M2 Ultra METAL tiny-q5_0 1 1 9.81 1.46 0.50 0.01 6a94163
M2 Ultra METAL tiny-q5_1 1 1 8.80 1.47 0.50 0.01 6a94163
M2 Ultra METAL base 1 1 16.11 1.96 0.74 0.02 6a94163
M2 Ultra METAL base-q5_0 1 1 16.38 1.99 0.78 0.02 6a94163
M2 Ultra METAL base-q5_1 1 1 16.72 2.00 0.77 0.02 6a94163
M2 Ultra METAL small 1 1 41.26 3.88 1.66 0.05 6a94163
M2 Ultra METAL small-q5_0 1 1 46.91 4.02 1.76 0.06 6a94163
M2 Ultra METAL small-q5_1 1 1 47.05 4.00 1.73 0.06 6a94163
M2 Ultra METAL medium 1 1 111.29 7.79 3.63 0.11 6a94163
M2 Ultra METAL medium-q5_0 1 1 129.78 7.71 3.85 0.13 6a94163
M2 Ultra METAL medium-q5_1 1 1 129.29 7.71 3.87 0.13 6a94163
M2 Ultra METAL medium-dis 1 1 99.27 1.09 0.43 0.02 6a94163
M2 Ultra METAL large-v2 1 1 198.81 11.54 5.59 0.20 6a94163
M2 Ultra METAL large-v2-q5_0 1 1 236.18 11.12 6.11 0.24 6a94163
M2 Ultra METAL large-v2-q5_1 1 1 235.88 11.14 6.01 0.24 6a94163
M2 Ultra METAL large-v2-dis 1 1 177.41 1.21 0.48 0.02 6a94163
M2 Ultra METAL large-v3-turbo 1 1 178.92 1.89 0.83 0.03 6a94163
M2 Ultra METAL large-v3-turbo-q5_0 1 1 211.44 1.73 0.90 0.04 6a94163

Flash Attention OFF:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 Ultra METAL tiny 1 0 10.04 1.37 0.50 0.01 6a94163
M2 Ultra METAL tiny-q5_0 1 0 10.02 1.36 0.53 0.01 6a94163
M2 Ultra METAL tiny-q5_1 1 0 11.08 1.37 0.53 0.01 6a94163
M2 Ultra METAL base 1 0 17.84 1.93 0.77 0.02 6a94163
M2 Ultra METAL base-q5_0 1 0 18.57 1.92 0.81 0.02 6a94163
M2 Ultra METAL base-q5_1 1 0 18.66 1.93 0.82 0.02 6a94163
M2 Ultra METAL small 1 0 48.26 3.95 1.73 0.05 6a94163
M2 Ultra METAL small-q5_0 1 0 53.68 3.99 1.85 0.06 6a94163
M2 Ultra METAL small-q5_1 1 0 53.86 4.00 1.82 0.06 6a94163
M2 Ultra METAL medium 1 0 130.09 8.01 3.82 0.13 6a94163
M2 Ultra METAL medium-q5_0 1 0 148.18 7.92 4.11 0.14 6a94163
M2 Ultra METAL medium-q5_1 1 0 147.95 7.94 4.11 0.14 6a94163
M2 Ultra METAL medium-dis 1 0 116.97 1.11 0.42 0.02 6a94163
M2 Ultra METAL large-v2 1 0 232.43 12.34 5.87 0.22 6a94163
M2 Ultra METAL large-v2-q5_0 1 0 269.72 11.68 6.44 0.26 6a94163
M2 Ultra METAL large-v2-q5_1 1 0 269.71 11.82 6.36 0.26 6a94163
M2 Ultra METAL large-v2-dis 1 0 209.25 1.25 0.48 0.02 6a94163
M2 Ultra METAL large-v3-turbo 1 0 211.09 1.98 0.84 0.03 6a94163
M2 Ultra METAL large-v3-turbo-q5_0 1 0 244.23 1.81 0.92 0.04 6a94163

Ryzen 9 5950X + RTX 2060

Flash Attention ON:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 1 1 7.35 0.78 0.24 0.01 6a94163
RTX 2060 AVX2 CUDA tiny-q5_0 1 1 6.45 0.67 0.14 0.01 6a94163
RTX 2060 AVX2 CUDA tiny-q5_1 1 1 6.39 0.66 0.14 0.01 6a94163
RTX 2060 AVX2 CUDA base 1 1 10.20 0.88 0.30 0.01 6a94163
RTX 2060 AVX2 CUDA base-q5_0 1 1 11.38 0.92 0.21 0.02 6a94163
RTX 2060 AVX2 CUDA base-q5_1 1 1 11.76 0.91 0.20 0.02 6a94163
RTX 2060 AVX2 CUDA small 1 1 33.06 2.00 0.56 0.03 6a94163
RTX 2060 AVX2 CUDA small-q5_0 1 1 35.84 1.84 0.43 0.04 6a94163
RTX 2060 AVX2 CUDA small-q5_1 1 1 36.89 1.82 0.42 0.04 6a94163
RTX 2060 AVX2 CUDA medium 1 1 90.65 4.54 1.13 0.08 6a94163
RTX 2060 AVX2 CUDA medium-q5_0 1 1 104.01 3.80 0.91 0.10 6a94163
RTX 2060 AVX2 CUDA medium-q5_1 1 1 107.98 3.72 0.87 0.10 6a94163
RTX 2060 AVX2 CUDA medium-dis 1 1 79.08 0.68 0.17 0.01 6a94163
RTX 2060 AVX2 CUDA large-v2 1 1 162.00 7.52 1.92 0.14 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_0 1 1 184.59 5.64 1.50 0.16 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_1 1 1 193.85 5.55 1.44 0.17 6a94163
RTX 2060 AVX2 CUDA large-v2-dis 1 1 140.75 0.84 0.37 0.02 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo 1 1 143.38 1.29 0.36 0.02 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo-q5_0 1 1 163.30 0.93 0.28 0.03 6a94163

Flash Attention OFF:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 1 0 12.49 0.87 0.23 0.01 6a94163
RTX 2060 AVX2 CUDA tiny-q5_0 1 0 10.65 0.78 0.19 0.02 6a94163
RTX 2060 AVX2 CUDA tiny-q5_1 1 0 10.82 0.77 0.19 0.02 6a94163
RTX 2060 AVX2 CUDA base 1 0 18.97 1.04 0.34 0.02 6a94163
RTX 2060 AVX2 CUDA base-q5_0 1 0 20.22 1.09 0.27 0.02 6a94163
RTX 2060 AVX2 CUDA base-q5_1 1 0 20.48 1.07 0.27 0.02 6a94163
RTX 2060 AVX2 CUDA small 1 0 59.52 2.37 0.70 0.05 6a94163
RTX 2060 AVX2 CUDA small-q5_0 1 0 62.98 2.23 0.60 0.06 6a94163
RTX 2060 AVX2 CUDA small-q5_1 1 0 63.64 2.21 0.59 0.06 6a94163
RTX 2060 AVX2 CUDA medium 1 0 161.53 5.36 1.53 0.13 6a94163
RTX 2060 AVX2 CUDA medium-q5_0 1 0 174.96 4.64 1.32 0.15 6a94163
RTX 2060 AVX2 CUDA medium-q5_1 1 0 178.42 4.57 1.29 0.15 6a94163
RTX 2060 AVX2 CUDA medium-dis 1 0 149.65 0.75 0.20 0.02 6a94163
RTX 2060 AVX2 CUDA large-v2 1 0 280.55 8.74 2.51 0.23 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_0 1 0 306.87 6.92 2.08 0.25 6a94163
RTX 2060 AVX2 CUDA large-v2-q5_1 1 0 314.25 6.82 2.02 0.26 6a94163
RTX 2060 AVX2 CUDA large-v2-dis 1 0 259.39 0.91 0.37 0.02 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo 1 0 261.83 1.44 0.41 0.04 6a94163
RTX 2060 AVX2 CUDA large-v3-turbo-q5_0 1 0 282.99 1.09 0.33 0.04 6a94163
CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
Ryzen 9 5950X AVX2 tiny 16 0 137.31 1.38 0.37 0.20 6a94163
Ryzen 9 5950X AVX2 tiny-q5_0 16 0 143.29 0.54 0.25 0.19 6a94163
Ryzen 9 5950X AVX2 tiny-q5_1 16 0 144.11 0.58 0.27 0.20 6a94163
Ryzen 9 5950X AVX2 base 16 0 293.81 3.15 0.80 0.33 6a94163
Ryzen 9 5950X AVX2 base-q5_0 16 0 311.95 1.18 0.45 0.32 6a94163
Ryzen 9 5950X AVX2 base-q5_1 16 0 319.06 1.26 0.49 0.34 6a94163
Ryzen 9 5950X AVX2 small 16 0 1005.64 11.78 2.79 0.88 6a94163
Ryzen 9 5950X AVX2 small-q5_0 16 0 1110.41 5.44 1.53 0.91 6a94163
Ryzen 9 5950X AVX2 small-q5_1 16 0 1159.07 5.72 1.66 0.94 6a94163
Ryzen 9 5950X AVX2 medium 16 0 3004.36 36.61 8.21 2.32 6a94163
Ryzen 9 5950X AVX2 medium-q5_0 16 0 3441.00 17.69 4.67 2.52 6a94163
Ryzen 9 5950X AVX2 medium-q5_1 16 0 3588.38 18.61 4.93 2.63 6a94163
Ryzen 9 5950X AVX2 medium-dis 16 0 2805.43 4.94 1.12 0.39 6a94163
Ryzen 9 5950X AVX2 large-v2 16 0 5630.44 70.50 15.52 4.16 6a94163
Ryzen 9 5950X AVX2 large-v2-q5_0 16 0 6488.80 35.07 8.61 4.64 6a94163
Ryzen 9 5950X AVX2 large-v2-q5_1 16 0 6775.80 36.27 8.92 4.85 6a94163
Ryzen 9 5950X AVX2 large-v2-dis 16 0 5262.10 7.27 1.60 0.52 6a94163
Ryzen 9 5950X AVX2 large-v3-turbo 16 0 5302.64 11.52 2.55 0.76 6a94163
Ryzen 9 5950X AVX2 large-v3-turbo-q5_0 16 0 5984.73 4.26 1.16 0.80 6a94163

V100

Flash Attention ON:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
V100 AVX2 CUDA tiny 1 1 4.10 0.96 0.27 0.01 6a94163
V100 AVX2 CUDA tiny-q5_1 1 1 4.32 1.01 0.21 0.01 6a94163
V100 AVX2 CUDA base 1 1 7.23 1.30 0.35 0.02 6a94163
V100 AVX2 CUDA base-q5_1 1 1 7.51 1.32 0.27 0.02 6a94163
V100 AVX2 CUDA small 1 1 19.44 2.59 0.73 0.03 6a94163
V100 AVX2 CUDA small-q5_1 1 1 21.46 2.61 0.54 0.03 6a94163
V100 AVX2 CUDA medium 1 1 54.26 5.36 1.53 0.06 6a94163
V100 AVX2 CUDA medium-q5_0 1 1 56.13 5.01 1.04 0.07 6a94163
V100 AVX2 CUDA large-v2 1 1 94.48 7.80 2.18 0.10 6a94163
V100 AVX2 CUDA large-v2-q5_0 1 1 93.55 6.98 1.51 0.11 6a94163
V100 AVX2 CUDA large-v3-turbo 1 1 77.11 1.27 0.39 0.02 6a94163
V100 AVX2 CUDA large-v3-turbo-q5_0 1 1 80.22 1.10 0.31 0.02 6a94163

Flash Attention OFF:

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
V100 AVX2 CUDA tiny 1 0 6.03 1.09 0.31 0.01 6a94163
V100 AVX2 CUDA tiny-q5_1 1 0 6.10 1.13 0.26 0.01 6a94163
V100 AVX2 CUDA base 1 0 11.02 1.65 0.46 0.02 6a94163
V100 AVX2 CUDA base-q5_1 1 0 11.18 1.74 0.39 0.02 6a94163
V100 AVX2 CUDA small 1 0 31.05 3.10 0.85 0.04 6a94163
V100 AVX2 CUDA small-q5_1 1 0 31.75 3.12 0.71 0.04 6a94163
V100 AVX2 CUDA medium 1 0 83.99 6.40 1.82 0.09 6a94163
V100 AVX2 CUDA medium-q5_0 1 0 85.90 6.11 1.42 0.10 6a94163
V100 AVX2 CUDA large-v2 1 0 139.13 9.05 2.67 0.14 6a94163
V100 AVX2 CUDA large-v2-q5_0 1 0 142.98 8.47 2.06 0.16 6a94163
V100 AVX2 CUDA large-v3-turbo 1 0 126.75 1.51 0.45 0.02 6a94163
V100 AVX2 CUDA large-v3-turbo-q5_0 1 0 129.91 1.30 0.35 0.03 6a94163

For reference, here is the performance for v1.6.0

What's Changed

New Contributors

Full Changelog: v1.6.2...v1.7.0

Binaries

https://github.com/ggerganov/whisper.cpp/actions/runs/11193706782

Don't miss a new whisper.cpp release

NewReleases is sending notifications on new releases.