github Dao-AILab/flash-attention fa4-v4.0.0.beta8

pre-release6 hours ago

What's Changed

  • fix noisy logger by @drisspg in #2414
  • [AMD ROCm] Fix NaN in FMHA BWD when seq_q=0 by @rocking5566 in #2421
  • Add FA4 CI: GitHub Actions workflow with Apptainer on B200 runner by @Johnsonms in #2393
  • Fix some bugs of CI by @Johnsonms in #2423
  • [ROCM] Fix windows issues by @micmelesse in #2385
  • fix: add [cu13] extra to dev install instructions for CUDA 13 / B200 systems by @Johnsonms in #2430
  • Fix: disable 2-CTA backward mode when block_sparse_tensors is used by @jduprat in #2433
  • CI: extend FA4 test matrix with causal/non-causal correctness and fwd+bwd benchmark seqlen 1K-32K by @Johnsonms in #2428

Full Changelog: fa4-v4.0.0.beta7...fa4-v4.0.0.beta8

Don't miss a new flash-attention release

NewReleases is sending notifications on new releases.