github flashinfer-ai/flashinfer v0.6.2
Release v0.6.2

latest releases: nightly-v0.6.9-20260504, nightly-v0.6.9-20260503, nightly-v0.6.9-20260501...
3 months ago

What's Changed

  • chore: MoE benchmark effective BW fix for trtllm_block_scale_moe by @rosenrodt in #2341
  • Update Docker CI tags to 20260114-cc1a362 by @flashinfer-bot in #2351
  • [perf] Improve gemm_fp8_nt_groupwise (cutlass backend) by 10-40% for batch sizes <= 32 by @aidando73 in #2327
  • feat: Add auto-fixing pre-commit to Claude Code workflows by @yzh119 in #2331
  • tiny support glm routing by @b8zhong in #2313
  • fix: Handle zeros in Mistral Large 3 MoE inference by @dbari in #2238
  • benchmarks: Add norm and quantization routines to microbenchmark harness. by @bkryu in #2362
  • [CI] Add support for testing dependency commits before release by @yongwww in #2353
  • feat: introduce GitHub Actions workflow for PR testing by @yongwww in #2326
  • chore: Add TRTLLM MoE A2A benchmark by @rosenrodt in #2354
  • Added the cudnn backend Ragged KV Cache wrapper by @Anerudhan in #2352
  • Enable fp16/bf16/f32 support for selective_state_update (mamba) by @ishovkun in #2366
  • ci: increase nightly release build timeout by @yongwww in #2371
  • chore: fix claude git actions by @yzh119 in #2384
  • chore: add script to run unittests/benchmarks on Modal GPU runners by @yzh119 in #2377
  • bugfix: hotfix of PR 2366 (mamba kernel) by @yzh119 in #2378
  • ci: add docker cleanup before running tests by @yongwww in #2386
  • chore: Refactor benchmark imports to be lazy-loaded by @bkryu in #2388
  • fix: ensure each CTA processes full numHeadsQPerKv for trtllm decode kernel by @dongjiyingdjy in #2380
  • ci: add Docker Hub authentication to mitigate pull rate limits by @yongwww in #2393
  • A Blackwell-optimized version of selective_state_update (mamba) by @ishovkun in #2387
  • fix: In-place Residual Update for add_rmsnorm_fp4quant by @bkryu in #2385
  • hotfix: remove uv.lock and add it to .gitignore by @yzh119 in #2399
  • feat: [Qwen3-Next] Add Cute DSL GDN decode kernel and tests by @HongliMi in #2370
  • Update Mamba selective_state_scan API signature by @shaharmor98 in #2392
  • Optimize quantization function in large problem size by @Shunkangz in #2343
  • feat: Add output_both_sf_layouts option to add_rmsnorm_fp4quant API by @bkryu in #2395
  • release: bump version to 0.6.2 by @yzh119 in #2411

New Contributors

Full Changelog: v0.6.1...v0.6.2

Don't miss a new flashinfer release

NewReleases is sending notifications on new releases.