flashinfer-ai/flashinfer v0.6.10rc1 on GitHub

What's Changed

Vendor CCCL v3.3.2 from GitHub instead of relying on CTK-bundled copy by @kahyunnam in #3091
[Fmha] Add head_dim=512 support for trtllm attention kernels by @djmmoss in #2959
perf: optimize MXFP4xBF16 & INT4xFP8 CUTLASS MoE backend for SM90 by @samuellees in #3084
Add support for the combinations of allreduce, allgather, and reducescatter by @jinyangyuan-nvidia in #2563
[Fmha] update trtllm-gen FMHA cubins and sync headers for context SWA fix by @PerkzZheng in #3089
Report unit test files with no result by @dierksen in #3105
autotuner: check cache before synthesizing profile input tensors by @leejnau in #3126
fix: pass skip_softmax_threshold_scale_factor to prefill wrapper in test by @PerkzZheng in #3154
feat: Add DCP All-to-All kernel for context-parallel attention reduction by @davidjpyu in #2951
perf(autotuner): replace power-of-2 token buckets with hybrid spacing & fix missing routing_replay_out arg by @StudyingShao in #3115
feat: Integrate CuTe DSL FMHA prefill kernels by loading cubin by @limin2021 in #3039
unifying all reduce memory allocation for single-node and multi-node nvlink by @Amir-19 in #2955
Add all-gather matmul by @kwen2501 in #2665
Add examples of calling FlashInfer from JAX via jax-tvm-ffi by @katjasrz in #3092
bump version to 0.6.9 by @aleozlx in #3123
chore: Address non-blocking review feedback for #3051 / #3080 by @bkryu in #3128
perf: Add no-bias path for tinygemm_bf16 by @bkryu in #3151
feat: Add row_starts and dsa_graph_safe to topk by @zianglih in #3133
fix: guard MXFP8 fc1 weight shape check for non-gated activations by @ianliuy in #3082
[fix] fix blackwell gdn accuracy issue by @Observer007 in #3156
fix: fix OOB issue for vLLM by @nv-yunzheq in #2762
Build mnnvl_moe_alltoall with logger and stringUtils by @tiran in #2807
CICD bug fix: ensure data/ symlinks exist before jit-cache AOT compilation by @kahyunnam in #3158
feat: add get flashinfer-trace interface .fi_trace by @yyihuang in #2931
fix(gdn): use physical SM count for SM100 persistent prefill kernel by @arpera in #3155
fix(gdn): address remaining CodeRabbit feedback from #3001 by @arpera in #3165
Support NVFP4 KV for prefill and batch attention kernels by @Tom-Zheng in #3097
fix: skip version check for editable/source installs (0.0.0+unknown) by @ianliuy in #3061

New Contributors

@davidjpyu made their first contribution in #2951
@StudyingShao made their first contribution in #3115
@kwen2501 made their first contribution in #2665
@katjasrz made their first contribution in #3092
@ianliuy made their first contribution in #3082
@arpera made their first contribution in #3155

Full Changelog: v0.6.9rc1...v0.6.10rc1

flashinfer-ai/flashinfer v0.6.10rc1 Release v0.6.10rc1 on GitHub

What's Changed

New Contributors

flashinfer-ai/flashinfer v0.6.10rc1
Release v0.6.10rc1

on GitHub