What's Changed
- Fix ZeroDivisionError in num_splits_heuristic for empty Q workloads by @shivam2199 in #2515
- [Cute, flex, sm90] fix sm90 flex by @geruome in #2563
- split out varlen batch search into utils by @reubenconducts in #2556
- [Cute,Sm100] allow for zero length sequences in hdim 256 kernels by @jayhshah in #2568
- Enable split-kv for blocksparse tensors by @drisspg in #2536
New Contributors
- @shivam2199 made their first contribution in #2515
Full Changelog: fa4-v4.0.0.beta13...fa4-v4.0.0.beta14