Details
ggml-hexagon: flash-attn opt (#19025)
-
optimize flash attention kernel by improving score computation and online softmax update
-
wip
-
Refactor online softmax update in flash attention kernel for improved performance
-
Optimize flash attention kernel by replacing float array with HVX_Vector for score computation
-
wip
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: