github ggml-org/llama.cpp b7812

latest release: b7813
2 hours ago
Details

mla : make the V tensor a view of K (#18986)

  • mla : pass V as a view of K to the FA op

  • cuda : adjust mla logic to new layout

  • kv-cache : fix rope shift

  • tests : remove comment

  • cuda : fix reusable_cutoff

Co-authored-by: Johannes Gäßler johannesg@5d6.de


Co-authored-by: Johannes Gäßler johannesg@5d6.de

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.