ggml-org/llama.cpp b7812
on GitHub

latest releases: b9786, b9785, b9784...

5 months ago

Details

mla : make the V tensor a view of K (#18986)

mla : pass V as a view of K to the FA op
cuda : adjust mla logic to new layout
kv-cache : fix rope shift
tests : remove comment
cuda : fix reusable_cutoff

Co-authored-by: Johannes Gäßler johannesg@5d6.de

Co-authored-by: Johannes Gäßler johannesg@5d6.de

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b7812

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications