Overview
Massive performance improvements for the Metal backend, especially for beams > 1. Especially for quantized models.
Setting as "pre-release" since there have been major changes to the build system (now using CMake) and I wan't to gather some feedback about how well the project builds now on various platforms. Please leave comments in the discussion to help fix any remaining issues before the official release.
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | Metal | tiny | 1 | 1 | 7.90 | 1.26 | 0.35 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q5_0 | 1 | 1 | 8.44 | 1.23 | 0.36 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q5_1 | 1 | 1 | 8.26 | 1.27 | 0.37 | 0.01 | ed733e8 |
M2 Ultra | Metal | tiny-q8_0 | 1 | 1 | 8.03 | 1.21 | 0.35 | 0.01 | ed733e8 |
M2 Ultra | Metal | base | 1 | 1 | 13.77 | 1.80 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q5_0 | 1 | 1 | 15.02 | 1.72 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q5_1 | 1 | 1 | 14.93 | 1.74 | 0.42 | 0.02 | ed733e8 |
M2 Ultra | Metal | base-q8_0 | 1 | 1 | 14.26 | 1.68 | 0.41 | 0.02 | ed733e8 |
M2 Ultra | Metal | small | 1 | 1 | 39.76 | 3.54 | 0.85 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q5_0 | 1 | 1 | 45.07 | 3.47 | 0.87 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q5_1 | 1 | 1 | 44.82 | 3.49 | 0.87 | 0.05 | ed733e8 |
M2 Ultra | Metal | small-q8_0 | 1 | 1 | 41.79 | 3.30 | 0.84 | 0.05 | ed733e8 |
M2 Ultra | Metal | medium | 1 | 1 | 106.73 | 7.28 | 1.78 | 0.11 | ed733e8 |
M2 Ultra | Metal | medium-q5_0 | 1 | 1 | 124.43 | 6.63 | 1.83 | 0.12 | ed733e8 |
M2 Ultra | Metal | medium-q5_1 | 1 | 1 | 124.19 | 6.70 | 1.84 | 0.12 | ed733e8 |
M2 Ultra | Metal | medium-q8_0 | 1 | 1 | 113.88 | 6.52 | 1.75 | 0.11 | ed733e8 |
M2 Ultra | Metal | medium-dis | 1 | 1 | 94.97 | 0.97 | 0.22 | 0.01 | ed733e8 |
M2 Ultra | Metal | large-v2 | 1 | 1 | 193.33 | 10.53 | 2.65 | 0.20 | ed733e8 |
M2 Ultra | Metal | large-v2-q5_0 | 1 | 1 | 229.22 | 9.52 | 2.72 | 0.23 | ed733e8 |
M2 Ultra | Metal | large-v2-q5_1 | 1 | 1 | 229.40 | 9.62 | 2.73 | 0.23 | ed733e8 |
M2 Ultra | Metal | large-v2-q8_0 | 1 | 1 | 207.30 | 9.36 | 2.59 | 0.21 | ed733e8 |
M2 Ultra | Metal | large-v2-dis | 1 | 1 | 171.43 | 1.09 | 0.25 | 0.02 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo | 1 | 1 | 173.45 | 1.73 | 0.41 | 0.03 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo-q5_0 | 1 | 1 | 205.52 | 1.52 | 0.42 | 0.04 | ed733e8 |
M2 Ultra | Metal | large-v3-turbo-q8_0 | 1 | 1 | 185.90 | 1.48 | 0.40 | 0.03 | ed733e8 |
What's Changed
- sync : ggml by @ggerganov in #2573
- ruby : Follow source tree change by @KitaitiMakoto in #2580
- Add
q8_0
models todownload-ggml-model.sh
by @mrienstra in #2589 - ruby : Add low-level methods to transcribe by @KitaitiMakoto in #2585
- sync : ggml by @ggerganov in #2608
Full Changelog: v1.7.2...v1.7.3-pre