Overview
- Flash attention is now enabled by default
- Performance improvements
M1 Pro
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M1 Pro | METAL | tiny | 1 | 0 | 32.44 | 1.71 | 0.43 | 0.04 | 8a67c55 |
M1 Pro | METAL | base | 1 | 0 | 63.54 | 2.62 | 0.71 | 0.06 | 8a67c55 |
M1 Pro | METAL | small | 1 | 0 | 200.30 | 5.34 | 1.72 | 0.17 | 8a67c55 |
M1 Pro | METAL | medium | 1 | 0 | 580.06 | 11.71 | 4.18 | 0.45 | 8a67c55 |
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M1 Pro | METAL | tiny | 1 | 1 | 22.09 | 1.84 | 0.43 | 0.03 | 8a67c55 |
M1 Pro | METAL | base | 1 | 1 | 40.57 | 2.22 | 0.44 | 0.04 | 8a67c55 |
M1 Pro | METAL | small | 1 | 1 | 135.15 | 4.23 | 0.95 | 0.12 | 8a67c55 |
M1 Pro | METAL | medium | 1 | 1 | 395.18 | 9.14 | 2.21 | 0.30 | 8a67c55 |
M2 Ultra
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 ULTRA | METAL | tiny | 1 | 0 | 8.63 | 1.09 | 0.27 | 0.01 | b57b9d3 |
M2 ULTRA | METAL | tiny-q5_0 | 1 | 0 | 9.04 | 1.06 | 0.28 | 0.01 | b57b9d3 |
M2 ULTRA | METAL | tiny-q5_1 | 1 | 0 | 8.98 | 1.06 | 0.28 | 0.01 | b57b9d3 |
M2 ULTRA | METAL | tiny-q8_0 | 1 | 0 | 8.69 | 1.06 | 0.27 | 0.01 | b57b9d3 |
M2 ULTRA | METAL | base | 1 | 0 | 15.39 | 1.54 | 0.43 | 0.02 | b57b9d3 |
M2 ULTRA | METAL | base-q5_0 | 1 | 0 | 16.50 | 1.50 | 0.42 | 0.02 | b57b9d3 |
M2 ULTRA | METAL | base-q5_1 | 1 | 0 | 16.45 | 1.49 | 0.43 | 0.02 | b57b9d3 |
M2 ULTRA | METAL | base-q8_0 | 1 | 0 | 15.62 | 1.51 | 0.42 | 0.02 | b57b9d3 |
M2 ULTRA | METAL | small | 1 | 0 | 45.99 | 2.99 | 0.90 | 0.05 | b57b9d3 |
M2 ULTRA | METAL | small-q5_0 | 1 | 0 | 50.65 | 2.98 | 0.92 | 0.06 | b57b9d3 |
M2 ULTRA | METAL | small-q5_1 | 1 | 0 | 50.74 | 2.96 | 0.92 | 0.06 | b57b9d3 |
M2 ULTRA | METAL | small-q8_0 | 1 | 0 | 47.16 | 2.83 | 0.89 | 0.06 | b57b9d3 |
M2 ULTRA | METAL | medium | 1 | 0 | 132.78 | 6.46 | 2.02 | 0.13 | b57b9d3 |
M2 ULTRA | METAL | medium-q5_0 | 1 | 0 | 149.35 | 6.11 | 2.09 | 0.14 | b57b9d3 |
M2 ULTRA | METAL | medium-q5_1 | 1 | 0 | 149.11 | 6.09 | 2.11 | 0.14 | b57b9d3 |
M2 ULTRA | METAL | medium-q8_0 | 1 | 0 | 137.37 | 6.05 | 2.03 | 0.13 | b57b9d3 |
M2 ULTRA | METAL | medium-dis | 1 | 0 | 121.60 | 0.90 | 0.25 | 0.02 | b57b9d3 |
M2 ULTRA | METAL | large-v2 | 1 | 0 | 231.19 | 9.40 | 3.10 | 0.22 | b57b9d3 |
M2 ULTRA | METAL | large-v2-q5_0 | 1 | 0 | 265.90 | 8.98 | 3.11 | 0.25 | b57b9d3 |
M2 ULTRA | METAL | large-v2-q5_1 | 1 | 0 | 265.18 | 8.92 | 3.13 | 0.25 | b57b9d3 |
M2 ULTRA | METAL | large-v2-q8_0 | 1 | 0 | 240.23 | 9.06 | 2.98 | 0.23 | b57b9d3 |
M2 ULTRA | METAL | large-v2-dis | 1 | 0 | 210.25 | 0.99 | 0.28 | 0.02 | b57b9d3 |
M2 ULTRA | METAL | large-v3-turbo | 1 | 0 | 211.72 | 1.52 | 0.46 | 0.03 | b57b9d3 |
M2 ULTRA | METAL | large-v3-turbo-q5_0 | 1 | 0 | 242.17 | 1.40 | 0.47 | 0.04 | b57b9d3 |
M2 ULTRA | METAL | large-v3-turbo-q8_0 | 1 | 0 | 219.75 | 1.40 | 0.45 | 0.04 | b57b9d3 |
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 ULTRA | METAL | tiny | 1 | 1 | 6.28 | 0.96 | 0.22 | 0.01 | a77d11d |
M2 ULTRA | METAL | tiny-q5_0 | 1 | 1 | 6.69 | 0.92 | 0.22 | 0.01 | a77d11d |
M2 ULTRA | METAL | tiny-q5_1 | 1 | 1 | 6.67 | 0.91 | 0.22 | 0.01 | a77d11d |
M2 ULTRA | METAL | tiny-q8_0 | 1 | 1 | 6.34 | 0.92 | 0.21 | 0.01 | a77d11d |
M2 ULTRA | METAL | base | 1 | 1 | 10.77 | 1.30 | 0.32 | 0.02 | a77d11d |
M2 ULTRA | METAL | base-q5_0 | 1 | 1 | 11.84 | 1.23 | 0.33 | 0.02 | a77d11d |
M2 ULTRA | METAL | base-q5_1 | 1 | 1 | 11.95 | 1.24 | 0.33 | 0.02 | a77d11d |
M2 ULTRA | METAL | base-q8_0 | 1 | 1 | 11.14 | 1.23 | 0.32 | 0.02 | a77d11d |
M2 ULTRA | METAL | small | 1 | 1 | 32.12 | 2.43 | 0.65 | 0.04 | a77d11d |
M2 ULTRA | METAL | small-q5_0 | 1 | 1 | 36.95 | 2.42 | 0.68 | 0.04 | a77d11d |
M2 ULTRA | METAL | small-q5_1 | 1 | 1 | 37.40 | 2.42 | 0.68 | 0.04 | a77d11d |
M2 ULTRA | METAL | small-q8_0 | 1 | 1 | 33.48 | 2.30 | 0.65 | 0.04 | a77d11d |
M2 ULTRA | METAL | medium | 1 | 1 | 89.28 | 5.05 | 1.46 | 0.09 | a77d11d |
M2 ULTRA | METAL | medium-q5_0 | 1 | 1 | 105.24 | 4.89 | 1.48 | 0.11 | a77d11d |
M2 ULTRA | METAL | medium-q5_1 | 1 | 1 | 105.28 | 4.98 | 1.49 | 0.11 | a77d11d |
M2 ULTRA | METAL | medium-q8_0 | 1 | 1 | 93.61 | 4.89 | 1.43 | 0.10 | a77d11d |
M2 ULTRA | METAL | medium-dis | 1 | 1 | 78.44 | 0.81 | 0.20 | 0.01 | a77d11d |
M2 ULTRA | METAL | large-v2 | 1 | 1 | 165.69 | 7.50 | 2.16 | 0.17 | a77d11d |
M2 ULTRA | METAL | large-v2-q5_0 | 1 | 1 | 199.40 | 7.37 | 2.18 | 0.20 | a77d11d |
M2 ULTRA | METAL | large-v2-q5_1 | 1 | 1 | 199.29 | 7.37 | 2.21 | 0.20 | a77d11d |
M2 ULTRA | METAL | large-v2-q8_0 | 1 | 1 | 174.60 | 6.87 | 2.16 | 0.18 | a77d11d |
M2 ULTRA | METAL | large-v2-dis | 1 | 1 | 145.80 | 0.90 | 0.22 | 0.02 | a77d11d |
M2 ULTRA | METAL | large-v3-turbo | 1 | 1 | 146.98 | 1.31 | 0.34 | 0.03 | a77d11d |
M2 ULTRA | METAL | large-v3-turbo-q5_0 | 1 | 1 | 176.77 | 1.19 | 0.35 | 0.03 | a77d11d |
M2 ULTRA | METAL | large-v3-turbo-q8_0 | 1 | 1 | 154.73 | 1.20 | 0.33 | 0.03 | a77d11d |
M4 Max
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M4 Max | METAL | tiny | 1 | 0 | 10.51 | 0.86 | 0.23 | 0.01 | 47fcd7d |
M4 Max | METAL | tiny-q8_0 | 1 | 0 | 10.73 | 0.84 | 0.24 | 0.01 | 47fcd7d |
M4 Max | METAL | base | 1 | 0 | 19.50 | 1.34 | 0.36 | 0.02 | 47fcd7d |
M4 Max | METAL | base-q8_0 | 1 | 0 | 20.17 | 1.25 | 0.36 | 0.02 | 47fcd7d |
M4 Max | METAL | small | 1 | 0 | 61.91 | 2.77 | 0.78 | 0.06 | 47fcd7d |
M4 Max | METAL | small-q8_0 | 1 | 0 | 64.17 | 2.43 | 0.78 | 0.06 | 47fcd7d |
M4 Max | METAL | medium | 1 | 0 | 181.50 | 6.44 | 1.85 | 0.15 | 47fcd7d |
M4 Max | METAL | medium-q8_0 | 1 | 0 | 187.71 | 5.80 | 1.84 | 0.15 | 47fcd7d |
M4 Max | METAL | large-v2 | 1 | 0 | 335.49 | 10.49 | 3.01 | 0.26 | 47fcd7d |
M4 Max | METAL | large-v2-q8_0 | 1 | 0 | 349.89 | 8.65 | 2.97 | 0.27 | 47fcd7d |
M4 Max | METAL | large-v3-turbo | 1 | 0 | 301.34 | 1.83 | 0.49 | 0.04 | 47fcd7d |
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M4 Max | METAL | tiny | 1 | 1 | 8.23 | 0.71 | 0.16 | 0.01 | 47fcd7d |
M4 Max | METAL | tiny-q8_0 | 1 | 1 | 8.47 | 0.67 | 0.16 | 0.01 | 47fcd7d |
M4 Max | METAL | base | 1 | 1 | 15.47 | 1.12 | 0.26 | 0.02 | 47fcd7d |
M4 Max | METAL | base-q8_0 | 1 | 1 | 15.70 | 1.05 | 0.27 | 0.02 | 47fcd7d |
M4 Max | METAL | small | 1 | 1 | 49.82 | 2.37 | 0.53 | 0.05 | 47fcd7d |
M4 Max | METAL | small-q8_0 | 1 | 1 | 51.76 | 1.99 | 0.53 | 0.05 | 47fcd7d |
M4 Max | METAL | medium | 1 | 1 | 147.76 | 5.52 | 1.27 | 0.12 | 47fcd7d |
M4 Max | METAL | medium-q8_0 | 1 | 1 | 153.98 | 4.59 | 1.24 | 0.13 | 47fcd7d |
M4 Max | METAL | large-v2 | 1 | 1 | 282.89 | 9.06 | 2.11 | 0.22 | 47fcd7d |
M4 Max | METAL | large-v2-q8_0 | 1 | 1 | 296.43 | 7.44 | 2.09 | 0.23 | 47fcd7d |
M4 Max | METAL | large-v3-turbo | 1 | 1 | 249.91 | 1.65 | 0.38 | 0.04 | 47fcd7d |
RTX 5090
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 5090 | CUDA | tiny | 1 | 0 | 2.06 | 0.55 | 0.13 | 0.00 | e4bf87b |
RTX 5090 | CUDA | tiny-q8_0 | 1 | 0 | 2.50 | 0.55 | 0.14 | 0.01 | e4bf87b |
RTX 5090 | CUDA | base | 1 | 0 | 3.72 | 0.81 | 0.19 | 0.01 | e4bf87b |
RTX 5090 | CUDA | base-q8_0 | 1 | 0 | 4.35 | 0.79 | 0.20 | 0.01 | e4bf87b |
RTX 5090 | CUDA | small | 1 | 0 | 11.24 | 1.55 | 0.38 | 0.02 | e4bf87b |
RTX 5090 | CUDA | small-q8_0 | 1 | 0 | 12.69 | 1.69 | 0.40 | 0.02 | e4bf87b |
RTX 5090 | CUDA | medium | 1 | 0 | 31.16 | 3.19 | 0.79 | 0.04 | e4bf87b |
RTX 5090 | CUDA | medium-q8_0 | 1 | 0 | 32.74 | 3.43 | 0.80 | 0.05 | e4bf87b |
RTX 5090 | CUDA | large-v2 | 1 | 0 | 50.09 | 4.55 | 1.14 | 0.05 | e4bf87b |
RTX 5090 | CUDA | large-v2-q8_0 | 1 | 0 | 52.44 | 4.76 | 1.11 | 0.07 | e4bf87b |
RTX 5090 | CUDA | large-v3-turbo | 1 | 0 | 46.78 | 0.70 | 0.17 | 0.01 | e4bf87b |
RTX 5090 | CUDA | large-v3-turbo-q8_0 | 1 | 0 | 48.57 | 0.70 | 0.16 | 0.01 | e4bf87b |
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 5090 | CUDA | tiny | 1 | 1 | 1.39 | 0.47 | 0.11 | 0.00 | e4bf87b |
RTX 5090 | CUDA | tiny-q8_0 | 1 | 1 | 1.83 | 0.48 | 0.12 | 0.01 | e4bf87b |
RTX 5090 | CUDA | base | 1 | 1 | 2.17 | 0.70 | 0.16 | 0.01 | e4bf87b |
RTX 5090 | CUDA | base-q8_0 | 1 | 1 | 2.78 | 0.68 | 0.17 | 0.01 | e4bf87b |
RTX 5090 | CUDA | small | 1 | 1 | 5.02 | 1.33 | 0.32 | 0.01 | e4bf87b |
RTX 5090 | CUDA | small-q8_0 | 1 | 1 | 6.39 | 1.46 | 0.34 | 0.02 | e4bf87b |
RTX 5090 | CUDA | medium | 1 | 1 | 13.89 | 2.68 | 0.64 | 0.03 | e4bf87b |
RTX 5090 | CUDA | medium-q8_0 | 1 | 1 | 15.40 | 2.92 | 0.67 | 0.04 | e4bf87b |
RTX 5090 | CUDA | large-v2 | 1 | 1 | 21.24 | 3.88 | 0.96 | 0.04 | e4bf87b |
RTX 5090 | CUDA | large-v2-q8_0 | 1 | 1 | 23.54 | 4.01 | 0.93 | 0.05 | e4bf87b |
RTX 5090 | CUDA | large-v3-turbo | 1 | 1 | 18.18 | 0.62 | 0.15 | 0.01 | e4bf87b |
RTX 5090 | CUDA | large-v3-turbo-q8_0 | 1 | 1 | 19.89 | 0.61 | 0.14 | 0.01 | e4bf87b |
What's Changed
- ci : add support for tag-based releases by @danbev in #3287
- Cmake commands "-j" flag in README.md by @toboil-features in #3284
- ci : add should_release variable by @danbev in #3288
- whisper : add version function by @danbev in #3289
- ruby : add Whisper::VERSION by @KitaitiMakoto in #3292
- ci: set fail-fast to false in docker.yml by @danbev in #3294
- ci : use selective copy for musa image by @danbev in #3296
- sync : ggml by @ggerganov in #3300
- feat: support vad for addon.node by @buxuku in #3301
- Adding dtw.params to server.cpp for v3-large-turbo by @accessiblepixel in #3307
- sync : ggml by @ggerganov in #3319
- fix and update links in whisper, command, bench and stream wasm examples by @gregsadetsky in #3318
- whisper: validate get_rows support for cpu extra buffer by @chaxu01 in #3323
- sync : ggml by @ggerganov in #3329
- bindings/go: fixed Mac OS X builds by @bvk in #3310
- feat(server): hide language probabilities option behind flag by @sachaarbonel in #3328
- musa: upgrade musa sdk to rc4.2.0 by @yeahdongcn in #3324
- ci : add paths to build.yml by @danbev in #3333
- examples : add note about WHISPER_WASM_SINGLE_FILE [no ci] by @danbev in #3332
- Support static xcframework packaging in build-xcframework.sh by @richwaters in #3322
- sync : ggml by @ggerganov in #3342
- ggml : remove old kompute, cann (skip) by @ggerganov in #3349
- whisper : reset conv scheduler when CoreML is used by @ggerganov in #3350
- stream.wasm : add language selection support by @danbev in #3354
- ruby : Add ruby binding for max_len by @adamdebono in #3365
- wasm : change ggml model host to HF by @ggerganov in #3369
- whisper : fixed crash in GPU device selection on multi-GPU systems by @Dw9 in #3372
- Update main-cuda.Dockerfile by @ustas-eth in #3371
- node : add win platform check for require path by @danbev in #3363
- sync : ggml by @ggerganov in #3383
- Update
./models/download-ggml-model.cmd
to allow for tdrz download by @GalFawkes in #3381 - Ruby binding: handle negative value in padding by @Treboko in #3389
- tests : use CMake definitions for model/sample paths by @danbev in #3406
- ci : remove brew installation of cmake for macos-latest by @danbev in #3408
- whisper : prefer curl over wget in download scripts by @svmhdvn in #3409
- Fix MKL detection by quoting BLAS_INCLUDE_DIRS by @czoido in #3426
- sync : ggml by @ggerganov in #3428
- whisper : remove ggml_mul_mat padding by @ggerganov in #3436
- ci : add self-hosted workflows by @ggerganov in #3437
- bench : warm-up all kernels by @ggerganov in #3438
- bench : update [no ci] by @ggerganov in #3439
- sync : ggml by @ggerganov in #3442
- whisper : enable flash attention by default by @ggerganov in #3441
- examples : add wchess.wasm to wasm examples build by @danbev in #3443
New Contributors
- @accessiblepixel made their first contribution in #3307
- @chaxu01 made their first contribution in #3323
- @bvk made their first contribution in #3310
- @richwaters made their first contribution in #3322
- @adamdebono made their first contribution in #3365
- @Dw9 made their first contribution in #3372
- @ustas-eth made their first contribution in #3371
- @GalFawkes made their first contribution in #3381
- @Treboko made their first contribution in #3389
- @svmhdvn made their first contribution in #3409
Full Changelog: v1.7.6...v1.8.0