What's Changed
- Vendor CCCL v3.3.2 from GitHub instead of relying on CTK-bundled copy by @kahyunnam in #3091
- [Fmha] Add head_dim=512 support for trtllm attention kernels by @djmmoss in #2959
- perf: optimize MXFP4xBF16 & INT4xFP8 CUTLASS MoE backend for SM90 by @samuellees in #3084
- Add support for the combinations of allreduce, allgather, and reducescatter by @jinyangyuan-nvidia in #2563
- [Fmha] update trtllm-gen FMHA cubins and sync headers for context SWA fix by @PerkzZheng in #3089
- Report unit test files with no result by @dierksen in #3105
- autotuner: check cache before synthesizing profile input tensors by @leejnau in #3126
- fix: pass skip_softmax_threshold_scale_factor to prefill wrapper in test by @PerkzZheng in #3154
- feat: Add DCP All-to-All kernel for context-parallel attention reduction by @davidjpyu in #2951
- perf(autotuner): replace power-of-2 token buckets with hybrid spacing & fix missing routing_replay_out arg by @StudyingShao in #3115
- feat: Integrate CuTe DSL FMHA prefill kernels by loading cubin by @limin2021 in #3039
- unifying all reduce memory allocation for single-node and multi-node nvlink by @Amir-19 in #2955
- Add all-gather matmul by @kwen2501 in #2665
- Add examples of calling FlashInfer from JAX via jax-tvm-ffi by @katjasrz in #3092
- bump version to 0.6.9 by @aleozlx in #3123
- chore: Address non-blocking review feedback for #3051 / #3080 by @bkryu in #3128
- perf: Add no-bias path for tinygemm_bf16 by @bkryu in #3151
- feat: Add
row_startsanddsa_graph_safeto topk by @zianglih in #3133 - fix: guard MXFP8 fc1 weight shape check for non-gated activations by @ianliuy in #3082
- [fix] fix blackwell gdn accuracy issue by @Observer007 in #3156
- fix: fix OOB issue for vLLM by @nv-yunzheq in #2762
- Build mnnvl_moe_alltoall with logger and stringUtils by @tiran in #2807
- CICD bug fix: ensure data/ symlinks exist before jit-cache AOT compilation by @kahyunnam in #3158
- feat: add get flashinfer-trace interface .fi_trace by @yyihuang in #2931
- fix(gdn): use physical SM count for SM100 persistent prefill kernel by @arpera in #3155
- fix(gdn): address remaining CodeRabbit feedback from #3001 by @arpera in #3165
- Support NVFP4 KV for prefill and batch attention kernels by @Tom-Zheng in #3097
- fix: skip version check for editable/source installs (0.0.0+unknown) by @ianliuy in #3061
New Contributors
- @davidjpyu made their first contribution in #2951
- @StudyingShao made their first contribution in #3115
- @kwen2501 made their first contribution in #2665
- @katjasrz made their first contribution in #3092
- @ianliuy made their first contribution in #3082
- @arpera made their first contribution in #3155
Full Changelog: v0.6.9rc1...v0.6.10rc1