What's Changed
- fix(hd256/sm100): make q/k/v contiguous before dedicated hd256 kernel by @yunweili3 in #2666
- [Cute,Bwd,Sm100] add sparse MLA (Deepseek v4) backward kernels by @jayhshah in #2621
New Contributors
- @yunweili3 made their first contribution in #2666
Full Changelog: fa4-v4.0.0.beta18...fa4-v4.0.0.beta19