github Dao-AILab/flash-attention fa4-v4.0.0.beta17

latest release: v2.8.3.post1
pre-release4 hours ago

What's Changed

  • [Triton] Fix graph capture issues and env var by @micmelesse in #2620
  • [CuTe,Bwd,Sm100] allow 2cta with score mod and mask mod in bwd by @reubenconducts in #2557
  • [CuTe] Fix lint failures by @drisspg in #2625
  • [CuTe] Fix lint failure in flash_bwd_sm100.py by @Johnsonms in #2627
  • fix: add weights_only=True to all torch.load call sites by @aryanputta in #2622
  • [Cute,Sm100,Fwd] use correction warps if not tma store; remove outdated packgqa guard by @jayhshah in #2629
  • Add aux-scalars to interface to enable dynamic ints and floats in expressions by @drisspg in #2616
  • fix: build and select cu13.2 prebuilt wheels by @ko3n1g in #2618
  • ci(fa4): enforce cutlass-dsl/quack dep floors and rebake cu130 image by @Johnsonms in #2636

New Contributors

Full Changelog: fa4-v4.0.0.beta16...fa4-v4.0.0.beta17

Don't miss a new flash-attention release

NewReleases is sending notifications on new releases.