github ggml-org/llama.cpp b8690

3 hours ago
Details

vulkan: add FA dequant for q4_1, q5_0, q5_1, iq4_nl (#21029)

Add dequantize4() implementations for Q4_1, Q5_0, Q5_1, and IQ4_NL
in the flash attention base shader. Register them in the shader
generator, pipeline creation, and enable in the scalar/coopmat1 FA
support check.

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.