github facebookresearch/xformers v0.0.28
FAv3, profiler update & AMD

latest releases: v0.0.28.post3, v0.0.28.post2, v0.0.28.post1...
2 months ago

Pre-built binary wheels require PyTorch 2.4.1

Added

  • Added wheels for cuda 12.4
  • Added conda builds for python 3.11
  • Added wheels for rocm 6.1

Improved

  • Profiler: Fix computation of FLOPS for the attention when using xFormers
  • Profiler: Fix MFU/HFU calculation when multiple dtypes are used
  • Profiler: Trace analysis to compute MFU & HFU is now much faster
  • fMHA/splitK: Fixed nan in the output when using a torch.Tensor bias where a lot of consecutive keys are masked with -inf
  • Update Flash-Attention version to v2.6.3 when building from scratch
  • When using the most recent version of Flash-Attention, it is no longer possible to mix it with the cutlass backend. In other words, it is no longer possible to use the cutlass Fw with the flash Bw.

Removed

  • fMHA: Removed decoder and small_k backends
  • profiler: Removed DetectSlowOpsProfiler profiler
  • Removed compatibility with PyTorch < 2.4
  • Removed conda builds for python 3.9
  • Removed windows pip wheels for cuda 12.1 and 11.8

Don't miss a new xformers release

NewReleases is sending notifications on new releases.