What's Changed
- bugfix: Fix Persistent kernel precision for masked output by @Edenzzzz in #1533
- ci: create docker image for cu126/cu128/cu129 by @yzh119 in #1558
- Bugfix: some typos in Persistent kernel by @Edenzzzz in #1562
- fix: separate out fp4 lib into sm90 and sm100 versions, add oob checking in fused moe by @djmmoss in #1565
- bugfix: fix persistent attention kernel correctness on blackwell by @yzh119 in #1559
- ci: add unittest for different cuda version by @yzh119 in #1560
- release: bump version to v0.2.14.post1 by @yzh119 in #1568
Full Changelog: v0.2.14...v0.2.14.post1