flashinfer-ai/flashinfer v0.6.9rc1 on GitHub

What's Changed

feat: Add backend="b12x" for mm_fp4 on SM120 by @bkryu in #3051
docs: document MAX_JOBS env var and its interaction with FLASHINFER_N… by @aleozlx in #3060
PR #2772 might have introduced a device side compilation regression by @aleozlx in #3056
[feat] Add routing_replay_out support to MoE kernels and Python API by @TomerBN-Nvidia in #3024
fused_moe: pre-filter SM89 tactics with zero occupancy on SM120 Blackwell (fix review feedback on #2764) by @aniskumar-nv in #3032
feat: Add b12x CuTe DSL fused MoE for SM120 by @bkryu in #3066
CuTe DSL FP4 GEMM Heuristic by @Vinnie6167 in #2940
Support lse in trtllm paged attn kernels by @murphymatt in #3058
Revert "Support lse in trtllm paged attn kernels" by @aleozlx in #3079
docs(gdn): document -1 padding index semantics for pool+indices path by @kaixih in #3019
feat(gdn): separate input and output pool indices by @feldsherov in #2905
[CICD fix] Adjust CICD MAX_JOBS to fix OOM on H100 tests by @kahyunnam in #3078
Add qiching as code owner for autotuner files by @sricketts in #3104
Route the missing parameter for trtllm_fp8_per_tensor_scale_moe_op by @pavanimajety in #3094
Fix: Extend b12x FP4 GEMM support to SM121 (GB10/DGX Spark) by @meena-at-work in #3113
Add parallel attention by @xueweilnvidia in #2630
[feat] Faster topk algorithm by @Aalanli in #3009
feat: Add b12x_fused_moe / B12xMoEWrapper SM120 APIs with micro kernel and ReLU2 by @bkryu in #3080
[fmhav2] skip fp8 tests and add warning by @jimmyzho in #3050
feat: implement configurable tie_break for filtered topk by @zianglih in #3095
Add custom tuning buckets and rounding direction to autotune() by @vadiklyutiy in #2958
[CuTe DSL] Fix FP8 MLA persistent perf regression and ProxyKind cu13 wheel breakage by @pgera in #3132

New Contributors

@TomerBN-Nvidia made their first contribution in #3024
@aniskumar-nv made their first contribution in #3032
@Vinnie6167 made their first contribution in #2940
@meena-at-work made their first contribution in #3113
@xueweilnvidia made their first contribution in #2630
@Aalanli made their first contribution in #3009
@vadiklyutiy made their first contribution in #2958

Full Changelog: v0.6.8rc1...v0.6.9rc1

flashinfer-ai/flashinfer v0.6.9rc1 Release v0.6.9rc1 on GitHub

What's Changed

New Contributors

flashinfer-ai/flashinfer v0.6.9rc1
Release v0.6.9rc1

on GitHub