Details
ggml-cuda: native bf16 flash attention for vec kernel (#20525)
- ggml-cuda: native bf16 flash attention for vec and tile kernels
mma kernel still converts bf16 to fp16 before launch, native mma bf16 todo
- ggml-cuda: address code owner review feedback
reverted tile kernel changes to avoid larger refactor
-
fix ci failures on turing and hip
-
fix bf16 vec kernel compile on hip v_dot2 platforms
-
add comments
Co-authored-by: Johannes Gäßler johannesg@5d6.de
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: