github Dao-AILab/flash-attention fa4-v4.0.0.beta16

pre-release6 hours ago

What's Changed

  • Bump AITER submodule to commit 3b2e6f4 by @sstamenk in #2540
  • Clamp kv_stage to avoid SMEM overflow for small head_dims on SM100 by @Johnsonms in #2594
  • [CuTe,Sm100] fix: decode/prefill exp2 emulation consistency by @Luosuu in #2595
  • NFC: replace deprecated APIs: cute.make_fragment and cute.core.ThrMma by @brandon-yujie-sun in #2602
  • Bump nvidia-cutlass-dsl to >=4.5.2 and quack-kernels to >=0.5.0 by @Johnsonms in #2605
  • [CuTe,Fwd,Sm100] refactor mla sm100 forward and add page table by @jayhshah in #2558
  • ci: bump Jimver/cuda-toolkit to v0.2.35 for CUDA 13.2 support by @ko3n1g in #2617
  • [ROCm] Bump Triton to >=3.6.0 and update aiter submodule by @micmelesse in #2614

New Contributors

Full Changelog: fa4-v4.0.0.beta15...fa4-v4.0.0.beta16

Don't miss a new flash-attention release

NewReleases is sending notifications on new releases.