github flashinfer-ai/flashinfer v0.6.9rc1
Release v0.6.9rc1

7 hours ago

What's Changed

  • feat: Add backend="b12x" for mm_fp4 on SM120 by @bkryu in #3051
  • docs: document MAX_JOBS env var and its interaction with FLASHINFER_N… by @aleozlx in #3060
  • PR #2772 might have introduced a device side compilation regression by @aleozlx in #3056
  • [feat] Add routing_replay_out support to MoE kernels and Python API by @TomerBN-Nvidia in #3024
  • fused_moe: pre-filter SM89 tactics with zero occupancy on SM120 Blackwell (fix review feedback on #2764) by @aniskumar-nv in #3032
  • feat: Add b12x CuTe DSL fused MoE for SM120 by @bkryu in #3066
  • CuTe DSL FP4 GEMM Heuristic by @Vinnie6167 in #2940
  • Support lse in trtllm paged attn kernels by @murphymatt in #3058
  • Revert "Support lse in trtllm paged attn kernels" by @aleozlx in #3079
  • docs(gdn): document -1 padding index semantics for pool+indices path by @kaixih in #3019
  • feat(gdn): separate input and output pool indices by @feldsherov in #2905
  • [CICD fix] Adjust CICD MAX_JOBS to fix OOM on H100 tests by @kahyunnam in #3078
  • Add qiching as code owner for autotuner files by @sricketts in #3104
  • Route the missing parameter for trtllm_fp8_per_tensor_scale_moe_op by @pavanimajety in #3094
  • Fix: Extend b12x FP4 GEMM support to SM121 (GB10/DGX Spark) by @meena-at-work in #3113
  • Add parallel attention by @xueweilnvidia in #2630
  • [feat] Faster topk algorithm by @Aalanli in #3009
  • feat: Add b12x_fused_moe / B12xMoEWrapper SM120 APIs with micro kernel and ReLU2 by @bkryu in #3080
  • [fmhav2] skip fp8 tests and add warning by @jimmyzho in #3050
  • feat: implement configurable tie_break for filtered topk by @zianglih in #3095
  • Add custom tuning buckets and rounding direction to autotune() by @vadiklyutiy in #2958
  • [CuTe DSL] Fix FP8 MLA persistent perf regression and ProxyKind cu13 wheel breakage by @pgera in #3132

New Contributors

Full Changelog: v0.6.8rc1...v0.6.9rc1

Don't miss a new flashinfer release

NewReleases is sending notifications on new releases.