github ggml-org/llama.cpp b7723

latest releases: b7726, b7725
7 hours ago
Details

HIP: add fattn-mma-f16 for RDNA4 (#18481)

  • finish VQ mma

  • flash_attn_ext_f16_iter

  • KQ_rowsum

  • correct exp

  • fix scale error

  • fix softmax scale

  • fix softmax scale

  • enable fattn on cpu side

  • fix random error

  • disable fattn-mma-f16 on rdna3

  • fix wrong col for rdna

  • use identity mat to transpose

  • resolve conflicts

  • basic tuning for DeepSeek-R1-Distill-Qwen-1.5B

  • fix volta compile error

  • align rdna4 policy for fattn

  • adjust fattn policy

  • adjust kernel selection logic

  • update as the review comments

  • keep fattn-wmma logic

  • adjust kernel selection logic


Co-authored-by: zhang hui you@example.com
Co-authored-by: Johannes Gäßler johannesg@5d6.de

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.