github Dao-AILab/flash-attention fa4-v4.0.0.beta10

pre-release19 hours ago

What's Changed

  • Disable 2CTA fwd non-causal on CUDA 12 to work around codegen regression by @Johnsonms in #2461
  • Add CLC scheduler heuristic by @drisspg in #2455
  • expose num_splits for FA2 and add option for kernel blocksize alignment by @liangel-02 in #2448
  • [Cute,Fwd,Sm100] fp8 e4m3 and e5m2 support by @dcw02 in #2109
  • Expose --pack-gqa and --num-splits in benchmark_attn.py by @Johnsonms in #2473
  • Fix: pass num_splits through varlen_fwd Python wrapper (fixes #2448 regression) by @hsyysy in #2476
  • [Cute,Fwd,Sm100] Fix the crash when seqlen_k=0 by @Johnsonms in #2470
  • fix causal calcs by @drisspg in #2463
  • [cute,bwd] fix PDL race in bwd_preprocess, which corrupting dpsum on SM90+ by @geruome in #2481

New Contributors

Full Changelog: fa4-v4.0.0.beta9...fa4-v4.0.0.beta10

Don't miss a new flash-attention release

NewReleases is sending notifications on new releases.