Details
HIP: add fattn-mma-f16 for RDNA4 (#18481)
-
finish VQ mma
-
flash_attn_ext_f16_iter
-
KQ_rowsum
-
correct exp
-
fix scale error
-
fix softmax scale
-
fix softmax scale
-
enable fattn on cpu side
-
fix random error
-
disable fattn-mma-f16 on rdna3
-
fix wrong col for rdna
-
use identity mat to transpose
-
resolve conflicts
-
basic tuning for DeepSeek-R1-Distill-Qwen-1.5B
-
fix volta compile error
-
align rdna4 policy for fattn
-
adjust fattn policy
-
adjust kernel selection logic
-
update as the review comments
-
keep fattn-wmma logic
-
adjust kernel selection logic
Co-authored-by: zhang hui you@example.com
Co-authored-by: Johannes Gäßler johannesg@5d6.de
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: