ggml-org/llama.cpp b9089
on GitHub

latest release: b9090

4 hours ago

Details

SYCL: reduce allocation overhead during flash attention (#22732)

SYCL: reduce allocation overhead during flash attention
tidy up whitespace
add a note about the flag
move ggml_sycl_fattn_* into fattn-buffers.hpp
refactor implementation into fattn-buffers.cpp
move new_fattn_kv_buffers back into ggml-sycl.cpp

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b9089

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications