Dao-AILab/flash-attention fa4-v4.0.0.beta8 on GitHub

What's Changed

fix noisy logger by @drisspg in #2414
[AMD ROCm] Fix NaN in FMHA BWD when seq_q=0 by @rocking5566 in #2421
Add FA4 CI: GitHub Actions workflow with Apptainer on B200 runner by @Johnsonms in #2393
Fix some bugs of CI by @Johnsonms in #2423
[ROCM] Fix windows issues by @micmelesse in #2385
fix: add [cu13] extra to dev install instructions for CUDA 13 / B200 systems by @Johnsonms in #2430
Fix: disable 2-CTA backward mode when block_sparse_tensors is used by @jduprat in #2433
CI: extend FA4 test matrix with causal/non-causal correctness and fwd+bwd benchmark seqlen 1K-32K by @Johnsonms in #2428

Full Changelog: fa4-v4.0.0.beta7...fa4-v4.0.0.beta8