github flashinfer-ai/flashinfer v0.6.3
Release v0.6.3

latest releases: nightly-v0.6.11-20260518, nightly-v0.6.11-20260517, nightly-v0.6.11-20260516...
3 months ago

What's Changed

  • ci: add permission control for public ci tests by @yongwww in #2397
  • Remove cudaMalloc/Free in GDN prefill kernel by @KevinZeng08 in #2415
  • Update cudnn prefill to use correct sequence strides by @vedaanta in #2414
  • perf: mm_fp4 heuristic prioritizes CUTLASS over cuDNN on SM103 by @bkryu in #2404
  • test: add coverage for all cli commands by @sricketts in #1848
  • feat: BF16 GEMM using cuDNN backend by @raayandhar in #2376
  • refactor: simplify fp4 rmsnorm by @yzh119 in #2421
  • feat: update trtllm-gen MoE cubins by @nekorobov in #2416
  • chore/feat: A2A + MoE benchmark; add routed counterpart for trtllm_gen_fp8_fused_moe by @rosenrodt in #2379
  • [CI] Add on-demand rerun for spot-terminated jobs by @yongwww in #2403
  • fix: Fix NaN output in mxfp8_quantize for very small input values by @bkryu in #2441
  • feat: Support Fused MoE non gated Relu2 NVFP4 & FP8 and support Nemotron by @amitz-nv in #2304
  • infra: add manual code owner override support in codeowner_analyzer.py by @sricketts in #2418
  • fix: improve numerical stability of Gumbel sampling by @ixlmar in #2438
  • ci: CI build workflow should always pull fresh and do not cache by @bkryu in #2454
  • Update Docker CI tags to 20260131-a52eff1 by @flashinfer-bot in #2457
  • Revert "feat: Support Fused MoE non gated Relu2 NVFP4 & FP8 and support Nemotron" by @nv-yunzheq in #2451
  • Skip trtllm_alltoall tests on Thor by @dierksen in #2448
  • Fix argument type error in _cudnn_gemm_fp4_requirement by @Kangyan-Zhou in #2450
  • fix: set_log_level now properly sets logger level to enable DEBUG logs by @kahyunnam in #2449
  • bugfix: fix stub generation directory in fused_moe module by @yzh119 in #2445
  • [Perf][Feature] Add SM103-specific schedulers for NVFP4 CUTLASS kernels by @LopezCastroRoberto in #2303
  • ci: set LD_LIBRARY_PATH in Docker images for correct cuBLAS detection by @bkryu in #2468
  • add sgl_kernel.fast_topk_v2 to top_k benchmark by @huangzhilin-hzl in #2461
  • Update Docker CI tags to 20260203-9b5901e by @flashinfer-bot in #2475
  • MTP for mamba by @ishovkun in #2444
  • Add sm90 guard to fence ptx by @jhalabi-nv in #2439
  • perf: improve gdn decode cute-dsl kernels by @yzh119 in #2405
  • ci: migrate release workflows to ci-infra runners by @yongwww in #2467
  • fix: blockscale moe routine supports non-DS routing by @hypdeb in #2476
  • Fix autotuner oom by @zack041 in #2442
  • refactor: reduce hopper's gdn prefill compilation time and fix docstring. by @yzh119 in #2422
  • fix: Fix memory bandwidth calculation in MLA benchmarks by @bkryu in #2479
  • fix: Rename tests/mamba/test_utils.py to tests/mamba/utils.py to fix CI test discovery by @bkryu in #2481
  • Add/update multi node/multi GPU test scripts by @dierksen in #2410
  • feat: Support Fused MoE non gated Relu2 NVFP4 & FP8 and support Nemotron, fixed by @amitz-nv in #2462
  • ci: fix permission errors in release workflow on ci-infra runner by @yongwww in #2488
  • benchmarks: Expand microbenchmark harness to include sampling and RoPe APIs by @bkryu in #2484
  • fix: add support check for gemm config for cutlass moe by @nv-yunzheq in #2495
  • Allow non-DeepSeekV3 routing with one group by @dbari in #2502
  • bump version to 0.6.3 by @aleozlx in #2497

New Contributors

Full Changelog: v0.6.2...v0.6.3

Don't miss a new flashinfer release

NewReleases is sending notifications on new releases.