What's Changed
- fix: use LSE accum strides from params instead of hardcoded ones by @ZeronSix in #2388
- [Sm75] Add README link for initial Turing support by @ssiu in #2379
- [Cute,Sm100,Bwd] refine bwd swizzle for deterministic by @jayhshah in #2390
- Fix edge case when tag has no delta from previous by @drisspg in #2394
- [AMD ROCm] Update CK and add RDNA 3/4 support by @rocking5566 in #2400
- [Ai-assisted] CLC work stealing by @drisspg in #2218
- Various bug fixes / enable subtile > 2 by @drisspg in #2411
- Add to varlen by @drisspg in #2346
- Allow compact block sparse index tensors by @jduprat in #2417
New Contributors
Full Changelog: fa4-v4.0.0.beta5...fa4-v4.0.0.beta7