ggml-org/llama.cpp b9085
on GitHub

3 hours ago

Details

Add flash attention MMA / Tiles to support MiMo-V2.5 (#22812)

mimo-v2.5: add flash attention mma/tiles for for d_kq=192 d_v=128
mimo-v2.5: follow (256, 256) fattn templates
mimo-v2.5: cleanup comments
mimo-v2.5: further comment cleanup
mimo-v2.5: address PR feedback
fix GQA handling
check for other dangling 320/576 carveouts and mirror them for 192
Add to backend ops test so new paths are covered

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b9085

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications