ggml-org/llama.cpp b8470
on GitHub

latest releases: b9082, b9081, b9080...

one month ago

Details

ggml-cuda: native bf16 flash attention for vec kernel (#20525)

ggml-cuda: native bf16 flash attention for vec and tile kernels

mma kernel still converts bf16 to fp16 before launch, native mma bf16 todo

ggml-cuda: address code owner review feedback

reverted tile kernel changes to avoid larger refactor

fix ci failures on turing and hip
fix bf16 vec kernel compile on hip v_dot2 platforms
add comments

Co-authored-by: Johannes Gäßler johannesg@5d6.de

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b8470

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications