What's Changed
- Default to Shared Object by @jithunnair-amd in #33
- Add varlen support to AOTriton's Flash Attention by @xinyazhang in #31
- Switch to upstream Triton compiler, and related changes by @xinyazhang in #36
- Improve Backward Performance and Experimental Navi31 Support by @xinyazhang in #39
- Introduce new tuning system based on pre-compiled GPU kernels
- Navi 31's support is still experimental
- Support hipGraph usage in PyTorch by @xinyazhang in #40
- This changes the RNG API used by FA kernels.
- Switch to new testing scheme to match PyTorch 2.5's changes
New Contributors
- @jithunnair-amd made their first contribution in #33
Full Changelog: 0.6b...0.7b