github facebookresearch/xformers v0.0.23
Bugfixes/improvements in `memory_efficient_attention`

latest releases: v0.0.28.post3, v0.0.28.post2, v0.0.28.post1...
11 months ago

Pre-built binary wheels require PyTorch 2.1.1

Fixed

  • fMHA: Fixed a bug in cutlass backend forward pass where the logsumexp was not correctly calculated, resulting in wrong results in the BW pass. This would happen with MQA when one sequence has a query with length%64 == 1
  • fMHA: Updated Flash-Attention to v2.3.6 - this fixes a performance regression in causal backward passes, and now supports BlockDiagonalCausalWithOffsetPaddedKeysMask

Added

  • fMHA: Added LocalAttentionFromBottomRightMask (local)
  • fMHA: Added LowerTriangularFromBottomRightMask (causal)
  • fMHA: Added LowerTriangularFromBottomRightLocalAttentionMask (local + causal)

Removed

  • Removed xformers.triton.sum_strided

Don't miss a new xformers release

NewReleases is sending notifications on new releases.