github ggml-org/llama.cpp b9085

3 hours ago
Details

Add flash attention MMA / Tiles to support MiMo-V2.5 (#22812)

  • mimo-v2.5: add flash attention mma/tiles for for d_kq=192 d_v=128

  • mimo-v2.5: follow (256, 256) fattn templates

  • mimo-v2.5: cleanup comments

  • mimo-v2.5: further comment cleanup

  • mimo-v2.5: address PR feedback
    fix GQA handling
    check for other dangling 320/576 carveouts and mirror them for 192
    Add to backend ops test so new paths are covered

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.