github flashinfer-ai/flashinfer v0.6.6
Release v0.6.6

latest releases: nightly-v0.6.9-20260425, v0.6.9, v0.6.9rc1...
one month ago

What's Changed

  • fix: move ArtifactPath/CheckSumHash imports inside gen_moe_utils_modu… by @dierksen in #2681
  • Enable sm120f compilation by @kahyunnam in #2650
  • Ensure -gencode flags are in deterministic order (for ccache) by @benbarsdell in #2674
  • int16 Block-Scaled State and Stochastic Rounding for SSU (mamba) by @ishovkun in #2645
  • feat: add pool+indices support to gated_delta_rule_decode_pretranspose (bf16 path) by @kaixih in #2619
  • chore: replace bare print() with logging across the package by @esmeetu in #2648
  • fix: reduce smem allocation for tinygemm2 kernel in SM120 by @jimmyzho in #2670
  • [chore] bench_moe_deepseek.py allows adjusting expert distribution by @rosenrodt in #2678
  • feat: add support for more MLA head dimensions by @hypdeb in #2677
  • [fp8_blockwise]Fix int32 overflow in TRTLLM fused MoE activation kernel by @charlotte12l in #2642
  • Give knam codeowner override for Qwen3.5 (gdn) related directories by @kahyunnam in #2680
  • HOTFIX: Skip mamba Stochastic Rounding tests on sm_120 by @ishovkun in #2699
  • chore: Update CODEOWNERS by @flashinfer-bot in #2712
  • feat: support mxfp4 & mxfp8 entrypoint for blackwell cutedsl dense gemm by @b8zhong in #2660
  • Undo fix to AutoTuner find_nearest_profile by @danisereb in #2697
  • Experiment Add @kahyunnam as co-owner for several files by @aleozlx in #2713
  • chore: Update CODEOWNERS by @flashinfer-bot in #2719
  • Implement cutlass_fused_moe mxfp8 by @zianglih in #2581

New Contributors

Full Changelog: v0.6.5...v0.6.6

Don't miss a new flashinfer release

NewReleases is sending notifications on new releases.