github flashinfer-ai/flashinfer v0.6.0rc2
Release v0.6.0rc2

latest releases: nightly-v0.6.11-20260521, nightly-v0.6.11-20260520, nightly-v0.6.11-20260519...
5 months ago

What's Changed

  • [feat] Integrate SGLang concat_mla_k kernel into flashinfer by @jiahanc in #2237
  • fix: add DeepSeek routing for Bf16xBf16 and MxIntxBf16 TRT-LLM Gen MoE by @nekorobov in #2234
  • fix: Fix compilation with GCC 11 by @dbari in #2242
  • feat: RMSNorm/Fused RMSNorm + FP8 Quantization kernels by @BLaZeKiLL in #2243
  • feat: further optimize top-k and add fused top-k page construction kernels for DSA by @yzh119 in #2215
  • test: Fix MNNVL tests to skip when container lacks SYS_PTRACE capability by @bkryu in #2245
  • Remove cudaStreamSynchronize from gemm_groupwise_sm120.cuh for CUDA graph compatibility by @Copilot in #2244
  • feat: support variable sequence length in decode kernel of trtllm-gen attention by @yaoyaoding in #2125
  • feat: Fused RMSNorm + FP4 Quantization Kernels in CuTe-DSL by @bkryu in #2233
  • Allreduce auto backend improvements by @nvmbreughe in #2239

New Contributors

Full Changelog: v0.6.0rc1...v0.6.0rc2

Don't miss a new flashinfer release

NewReleases is sending notifications on new releases.