What's Changed
- fix: move ArtifactPath/CheckSumHash imports inside gen_moe_utils_modu… by @dierksen in #2681
- Enable sm120f compilation by @kahyunnam in #2650
- Ensure -gencode flags are in deterministic order (for ccache) by @benbarsdell in #2674
- int16 Block-Scaled State and Stochastic Rounding for SSU (mamba) by @ishovkun in #2645
- feat: add pool+indices support to gated_delta_rule_decode_pretranspose (bf16 path) by @kaixih in #2619
- chore: replace bare print() with logging across the package by @esmeetu in #2648
- fix: reduce smem allocation for tinygemm2 kernel in SM120 by @jimmyzho in #2670
- [chore] bench_moe_deepseek.py allows adjusting expert distribution by @rosenrodt in #2678
- feat: add support for more MLA head dimensions by @hypdeb in #2677
- [fp8_blockwise]Fix int32 overflow in TRTLLM fused MoE activation kernel by @charlotte12l in #2642
- Give knam codeowner override for Qwen3.5 (gdn) related directories by @kahyunnam in #2680
- HOTFIX: Skip mamba Stochastic Rounding tests on sm_120 by @ishovkun in #2699
- chore: Update CODEOWNERS by @flashinfer-bot in #2712
- feat: support mxfp4 & mxfp8 entrypoint for blackwell cutedsl dense gemm by @b8zhong in #2660
- Undo fix to AutoTuner find_nearest_profile by @danisereb in #2697
- Experiment Add @kahyunnam as co-owner for several files by @aleozlx in #2713
- chore: Update CODEOWNERS by @flashinfer-bot in #2719
- Implement
cutlass_fused_moemxfp8 by @zianglih in #2581
New Contributors
- @benbarsdell made their first contribution in #2674
- @charlotte12l made their first contribution in #2642
- @zianglih made their first contribution in #2581
Full Changelog: v0.6.5...v0.6.6