What's Changed
- fix noisy logger by @drisspg in #2414
- [AMD ROCm] Fix NaN in FMHA BWD when seq_q=0 by @rocking5566 in #2421
- Add FA4 CI: GitHub Actions workflow with Apptainer on B200 runner by @Johnsonms in #2393
- Fix some bugs of CI by @Johnsonms in #2423
- [ROCM] Fix windows issues by @micmelesse in #2385
- fix: add [cu13] extra to dev install instructions for CUDA 13 / B200 systems by @Johnsonms in #2430
- Fix: disable 2-CTA backward mode when block_sparse_tensors is used by @jduprat in #2433
- CI: extend FA4 test matrix with causal/non-causal correctness and fwd+bwd benchmark seqlen 1K-32K by @Johnsonms in #2428
Full Changelog: fa4-v4.0.0.beta7...fa4-v4.0.0.beta8