flashinfer-ai/flashinfer v0.2.8rc1 on GitHub

What's Changed

[fix] fix BatchAttention CTA_TILE_KV mask issue by @happierpig in #1206
feat: enable and update all-reduce fused quantization by @yyihuang in #1164
Fix the issue with auxillary kernel launch and grid dim calculation by @Anerudhan in #1208
Fix test_groupwise_scaled_gemm_fp8.py by @jinyangyuan-nvidia in #1211
[TVM] Remove enable_pdl from TVM binding interface by @MasterJH5574 in #1217
misc: minor adds in readme by @yyihuang in #1218
bugfix: fix blackwell fmha hanging issue for empty kv_len by @yzh119 in #1198
update trtllm-gen decode attention kernel launcher by @wenscarl in #1189
Handle allocation cutlass fused MoE output to caller by @wenscarl in #1225
Fix missing hash in the cudnn cubin path by @Anerudhan in #1227
bugfix: add logits processor to pyproject.toml by @yzh119 in #1224
fix: add trtllm-allreduce-fusion api notes and fix memory error by @yyihuang in #1229
feat: Add non-causal cudnn prefill kernels by @Anerudhan in #1230
minor: update oneshot handling, add params notes by @yyihuang in #1232
Enable cudnn decode and add tests for the cudnn decode kernel by @Anerudhan in #1221
docker: add cuda-python to CI docker image by @yzh119 in #1233
bugfix: Fix building without get_requires*() invocation by @mgorny in #1226
bugfix: support uint8_t for vec_t class template by @chenyang78 in #1234

Full Changelog: v0.2.7.post1...v0.2.8rc1