ggml-org/llama.cpp b7723
on GitHub

latest releases: b8192, b8191, b8190...

one month ago

Details

HIP: add fattn-mma-f16 for RDNA4 (#18481)

finish VQ mma
flash_attn_ext_f16_iter
KQ_rowsum
correct exp
fix scale error
fix softmax scale
fix softmax scale
enable fattn on cpu side
fix random error
disable fattn-mma-f16 on rdna3
fix wrong col for rdna
use identity mat to transpose
resolve conflicts
basic tuning for DeepSeek-R1-Distill-Qwen-1.5B
fix volta compile error
align rdna4 policy for fattn
adjust fattn policy
adjust kernel selection logic
update as the review comments
keep fattn-wmma logic
adjust kernel selection logic

Co-authored-by: zhang hui you@example.com
Co-authored-by: Johannes Gäßler johannesg@5d6.de

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b7723

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications