github ggml-org/llama.cpp b9089

latest release: b9090
4 hours ago
Details

SYCL: reduce allocation overhead during flash attention (#22732)

  • SYCL: reduce allocation overhead during flash attention

  • tidy up whitespace

  • add a note about the flag

  • move ggml_sycl_fattn_* into fattn-buffers.hpp

  • refactor implementation into fattn-buffers.cpp

  • move new_fattn_kv_buffers back into ggml-sycl.cpp

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.