flashinfer-ai/flashinfer v0.6.0rc2 on GitHub

What's Changed

[feat] Integrate SGLang concat_mla_k kernel into flashinfer by @jiahanc in #2237
fix: add DeepSeek routing for Bf16xBf16 and MxIntxBf16 TRT-LLM Gen MoE by @nekorobov in #2234
fix: Fix compilation with GCC 11 by @dbari in #2242
feat: RMSNorm/Fused RMSNorm + FP8 Quantization kernels by @BLaZeKiLL in #2243
feat: further optimize top-k and add fused top-k page construction kernels for DSA by @yzh119 in #2215
test: Fix MNNVL tests to skip when container lacks SYS_PTRACE capability by @bkryu in #2245
Remove cudaStreamSynchronize from gemm_groupwise_sm120.cuh for CUDA graph compatibility by @Copilot in #2244
feat: support variable sequence length in decode kernel of trtllm-gen attention by @yaoyaoding in #2125
feat: Fused RMSNorm + FP4 Quantization Kernels in CuTe-DSL by @bkryu in #2233
Allreduce auto backend improvements by @nvmbreughe in #2239

Full Changelog: v0.6.0rc1...v0.6.0rc2