flashinfer-ai/flashinfer v0.6.2 on GitHub

What's Changed

chore: MoE benchmark effective BW fix for trtllm_block_scale_moe by @rosenrodt in #2341
Update Docker CI tags to 20260114-cc1a362 by @flashinfer-bot in #2351
[perf] Improve gemm_fp8_nt_groupwise (cutlass backend) by 10-40% for batch sizes <= 32 by @aidando73 in #2327
feat: Add auto-fixing pre-commit to Claude Code workflows by @yzh119 in #2331
tiny support glm routing by @b8zhong in #2313
fix: Handle zeros in Mistral Large 3 MoE inference by @dbari in #2238
benchmarks: Add norm and quantization routines to microbenchmark harness. by @bkryu in #2362
[CI] Add support for testing dependency commits before release by @yongwww in #2353
feat: introduce GitHub Actions workflow for PR testing by @yongwww in #2326
chore: Add TRTLLM MoE A2A benchmark by @rosenrodt in #2354
Added the cudnn backend Ragged KV Cache wrapper by @Anerudhan in #2352
Enable fp16/bf16/f32 support for selective_state_update (mamba) by @ishovkun in #2366
ci: increase nightly release build timeout by @yongwww in #2371
chore: fix claude git actions by @yzh119 in #2384
chore: add script to run unittests/benchmarks on Modal GPU runners by @yzh119 in #2377
bugfix: hotfix of PR 2366 (mamba kernel) by @yzh119 in #2378
ci: add docker cleanup before running tests by @yongwww in #2386
chore: Refactor benchmark imports to be lazy-loaded by @bkryu in #2388
fix: ensure each CTA processes full numHeadsQPerKv for trtllm decode kernel by @dongjiyingdjy in #2380
ci: add Docker Hub authentication to mitigate pull rate limits by @yongwww in #2393
A Blackwell-optimized version of selective_state_update (mamba) by @ishovkun in #2387
fix: In-place Residual Update for add_rmsnorm_fp4quant by @bkryu in #2385
hotfix: remove uv.lock and add it to .gitignore by @yzh119 in #2399
feat: [Qwen3-Next] Add Cute DSL GDN decode kernel and tests by @HongliMi in #2370
Update Mamba selective_state_scan API signature by @shaharmor98 in #2392
Optimize quantization function in large problem size by @Shunkangz in #2343
feat: Add output_both_sf_layouts option to add_rmsnorm_fp4quant API by @bkryu in #2395
release: bump version to 0.6.2 by @yzh119 in #2411

New Contributors

@rosenrodt made their first contribution in #2341
@aidando73 made their first contribution in #2327
@HongliMi made their first contribution in #2370
@shaharmor98 made their first contribution in #2392
@Shunkangz made their first contribution in #2343

Full Changelog: v0.6.1...v0.6.2

flashinfer-ai/flashinfer v0.6.2 Release v0.6.2 on GitHub

What's Changed

New Contributors

flashinfer-ai/flashinfer v0.6.2
Release v0.6.2

on GitHub