Details
CUDA: Add Flash Attention Support for Head Dimension 512 (#20998)
-
flash attention support for head dimension 512 added
-
FA D=512 - match 576 configs, limit ncols2, revert vec cap
-
fix HIP tile kernel build for D=512
-
fix HIP tile kernel occupancy for D=512 on AMD
-
Apply suggestions from code review
Co-authored-by: Johannes Gäßler johannesg@5d6.de
- fix tile FA compilation
Co-authored-by: Johannes Gäßler johannesg@5d6.de
macOS/iOS:
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu s390x (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: