Triton 3.7.1 is a patch release on top of 3.7.0. It fixes the following 2 regressions and contains no new features or API changes.
Regression fixes
- Add async read dependencies to FenceAsync — a missing fence between a shared-memory store (st.shared) and an async copy_local_to_global could let the async copy read shared memory before the store completed, producing incorrect results. FenceAsync now inserts the required fence. (#9610)
- [InstCombine] Shrink added constant using LHS known zeros — fixes an LLVM InstCombine miscompilation where add simplification used known-zero bits only from the RHS, mishandling the symmetric case where the LHS has known zeros and the low bits are unused. Picked up by Triton through its pinned LLVM. (llvm/llvm-project#174380)