What's Changed
- typo: fix pdl terminology by @yzh119 in #933
- Fix "specutate" typo by @markmc in #934
- typo: fix target_probs docs after uniform_samples removal by @markmc in #935
- typo: remove another uniform samples leftover by @markmc in #937
- Fix/precommit issues by @diptorupd in #931
- ci: setup Jenkins by @yzh119 in #874
- bugfix: fix include header name conflict by @yzh119 in #939
- fix: Fix MLA TVM binding for the latest changes by @MasterJH5574 in #940
- feat - support mla kvcache store by @baowendin in #888
- Add POD-Attention to FlashInfer by @AKKamath in #858
- bugfix: fix potential issues of FA3 template loading nans for PageAttention by @yzh119 in #945
- fix - fix bug when not relevant seq has nan data by @baowendin in #942
- misc: add ci-badge, update blog list by @yzh119 in #948
- bugfix: Fix missing PyModuleDef field initializers by @sampan26 in #946
- fix: fix pod-attention compilation time by @yzh119 in #954
- bugfix: bugfix to #949 by @yzh119 in #951
- misc: Temporarily disable POD from AOT wheels by @abcdabcd987 in #956
- ci: improve jenkins by @yzh119 in #943
- Fix compilation on cuda 12.2 by @goliaro in #961
- doc: remove misleading docstring about
non_blockingby @yzh119 in #966 - perf: reduce torch.library dispatch overhead by @yzh119 in #968
- [TVM] Added tvm binding for sampling kernel by @annanyapr in #958
- perf: Fix python API overhead when CUDAGraph is not enabled by @yzh119 in #969
- Fix POD JIT bugs by @AKKamath in #971
- benchmark: add sampling.renorm benchmarks by @xslingcn in #970
- perf: dual pivot top-p/top-k renorm by @xslingcn in #974
- perf: Use 2WG pipeline design for MLA implementation on Hopper by @yzh119 in #952
- release: bump version to v0.2.4 by @yzh119 in #980
New Contributors
- @markmc made their first contribution in #934
- @diptorupd made their first contribution in #931
- @AKKamath made their first contribution in #858
- @sampan26 made their first contribution in #946
- @goliaro made their first contribution in #961
- @annanyapr made their first contribution in #958
Full Changelog: v0.2.3...v0.2.4