github flashinfer-ai/flashinfer v0.6.10rc1
Release v0.6.10rc1

6 hours ago

What's Changed

  • Vendor CCCL v3.3.2 from GitHub instead of relying on CTK-bundled copy by @kahyunnam in #3091
  • [Fmha] Add head_dim=512 support for trtllm attention kernels by @djmmoss in #2959
  • perf: optimize MXFP4xBF16 & INT4xFP8 CUTLASS MoE backend for SM90 by @samuellees in #3084
  • Add support for the combinations of allreduce, allgather, and reducescatter by @jinyangyuan-nvidia in #2563
  • [Fmha] update trtllm-gen FMHA cubins and sync headers for context SWA fix by @PerkzZheng in #3089
  • Report unit test files with no result by @dierksen in #3105
  • autotuner: check cache before synthesizing profile input tensors by @leejnau in #3126
  • fix: pass skip_softmax_threshold_scale_factor to prefill wrapper in test by @PerkzZheng in #3154
  • feat: Add DCP All-to-All kernel for context-parallel attention reduction by @davidjpyu in #2951
  • perf(autotuner): replace power-of-2 token buckets with hybrid spacing & fix missing routing_replay_out arg by @StudyingShao in #3115
  • feat: Integrate CuTe DSL FMHA prefill kernels by loading cubin by @limin2021 in #3039
  • unifying all reduce memory allocation for single-node and multi-node nvlink by @Amir-19 in #2955
  • Add all-gather matmul by @kwen2501 in #2665
  • Add examples of calling FlashInfer from JAX via jax-tvm-ffi by @katjasrz in #3092
  • bump version to 0.6.9 by @aleozlx in #3123
  • chore: Address non-blocking review feedback for #3051 / #3080 by @bkryu in #3128
  • perf: Add no-bias path for tinygemm_bf16 by @bkryu in #3151
  • feat: Add row_starts and dsa_graph_safe to topk by @zianglih in #3133
  • fix: guard MXFP8 fc1 weight shape check for non-gated activations by @ianliuy in #3082
  • [fix] fix blackwell gdn accuracy issue by @Observer007 in #3156
  • fix: fix OOB issue for vLLM by @nv-yunzheq in #2762
  • Build mnnvl_moe_alltoall with logger and stringUtils by @tiran in #2807
  • CICD bug fix: ensure data/ symlinks exist before jit-cache AOT compilation by @kahyunnam in #3158
  • feat: add get flashinfer-trace interface .fi_trace by @yyihuang in #2931
  • fix(gdn): use physical SM count for SM100 persistent prefill kernel by @arpera in #3155
  • fix(gdn): address remaining CodeRabbit feedback from #3001 by @arpera in #3165
  • Support NVFP4 KV for prefill and batch attention kernels by @Tom-Zheng in #3097
  • fix: skip version check for editable/source installs (0.0.0+unknown) by @ianliuy in #3061

New Contributors

Full Changelog: v0.6.9rc1...v0.6.10rc1

Don't miss a new flashinfer release

NewReleases is sending notifications on new releases.