ggml-org/llama.cpp b7820
on GitHub

latest releases: b8390, b8389, b8388...

one month ago

Details

ggml-hexagon: flash-attn opt (#19025)

optimize flash attention kernel by improving score computation and online softmax update
wip
Refactor online softmax update in flash attention kernel for improved performance
Optimize flash attention kernel by replacing float array with HVX_Vector for score computation
wip

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b7820

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications