github Dao-AILab/flash-attention fa4-v4.0.0.beta12

pre-release9 hours ago

What's Changed

  • Fix long MSVC linker commands on Windows by @jammm in #2517
  • Fix test_flash_attn_fast varlen call after qv positional insert by @henrylhtsang in #2527
  • [Cute,Bwd,Sm90] Fix determinism for GQA, port Sm100 approach in by @v0i0 in #2510
  • benchmarks/tune_ex2_emu: hd256 sweep support and clock lock/unlock by @Johnsonms in #2495
  • [FA4][hd256] Backward TMA bulk-store epilogue + LSE/dpsum coalesce by @Johnsonms in #2497
  • [hd256] Add TMA paged KV support to SM100 2CTA forward kernel by @Johnsonms in #2489
  • Deterministic backward for blocksparse impl by @drisspg in #2253

New Contributors

Full Changelog: fa4-v4.0.0.beta11...fa4-v4.0.0.beta12

Don't miss a new flash-attention release

NewReleases is sending notifications on new releases.