flashinfer-ai/flashinfer v0.6.3 on GitHub

What's Changed

ci: add permission control for public ci tests by @yongwww in #2397
Remove cudaMalloc/Free in GDN prefill kernel by @KevinZeng08 in #2415
Update cudnn prefill to use correct sequence strides by @vedaanta in #2414
perf: mm_fp4 heuristic prioritizes CUTLASS over cuDNN on SM103 by @bkryu in #2404
test: add coverage for all cli commands by @sricketts in #1848
feat: BF16 GEMM using cuDNN backend by @raayandhar in #2376
refactor: simplify fp4 rmsnorm by @yzh119 in #2421
feat: update trtllm-gen MoE cubins by @nekorobov in #2416
chore/feat: A2A + MoE benchmark; add routed counterpart for trtllm_gen_fp8_fused_moe by @rosenrodt in #2379
[CI] Add on-demand rerun for spot-terminated jobs by @yongwww in #2403
fix: Fix NaN output in mxfp8_quantize for very small input values by @bkryu in #2441
feat: Support Fused MoE non gated Relu2 NVFP4 & FP8 and support Nemotron by @amitz-nv in #2304
infra: add manual code owner override support in codeowner_analyzer.py by @sricketts in #2418
fix: improve numerical stability of Gumbel sampling by @ixlmar in #2438
ci: CI build workflow should always pull fresh and do not cache by @bkryu in #2454
Update Docker CI tags to 20260131-a52eff1 by @flashinfer-bot in #2457
Revert "feat: Support Fused MoE non gated Relu2 NVFP4 & FP8 and support Nemotron" by @nv-yunzheq in #2451
Skip trtllm_alltoall tests on Thor by @dierksen in #2448
Fix argument type error in _cudnn_gemm_fp4_requirement by @Kangyan-Zhou in #2450
fix: set_log_level now properly sets logger level to enable DEBUG logs by @kahyunnam in #2449
bugfix: fix stub generation directory in fused_moe module by @yzh119 in #2445
[Perf][Feature] Add SM103-specific schedulers for NVFP4 CUTLASS kernels by @LopezCastroRoberto in #2303
ci: set LD_LIBRARY_PATH in Docker images for correct cuBLAS detection by @bkryu in #2468
add sgl_kernel.fast_topk_v2 to top_k benchmark by @huangzhilin-hzl in #2461
Update Docker CI tags to 20260203-9b5901e by @flashinfer-bot in #2475
MTP for mamba by @ishovkun in #2444
Add sm90 guard to fence ptx by @jhalabi-nv in #2439
perf: improve gdn decode cute-dsl kernels by @yzh119 in #2405
ci: migrate release workflows to ci-infra runners by @yongwww in #2467
fix: blockscale moe routine supports non-DS routing by @hypdeb in #2476
Fix autotuner oom by @zack041 in #2442
refactor: reduce hopper's gdn prefill compilation time and fix docstring. by @yzh119 in #2422
fix: Fix memory bandwidth calculation in MLA benchmarks by @bkryu in #2479
fix: Rename tests/mamba/test_utils.py to tests/mamba/utils.py to fix CI test discovery by @bkryu in #2481
Add/update multi node/multi GPU test scripts by @dierksen in #2410
feat: Support Fused MoE non gated Relu2 NVFP4 & FP8 and support Nemotron, fixed by @amitz-nv in #2462
ci: fix permission errors in release workflow on ci-infra runner by @yongwww in #2488
benchmarks: Expand microbenchmark harness to include sampling and RoPe APIs by @bkryu in #2484
fix: add support check for gemm config for cutlass moe by @nv-yunzheq in #2495
Allow non-DeepSeekV3 routing with one group by @dbari in #2502
bump version to 0.6.3 by @aleozlx in #2497

New Contributors

@KevinZeng08 made their first contribution in #2415
@vedaanta made their first contribution in #2414
@ixlmar made their first contribution in #2438
@Kangyan-Zhou made their first contribution in #2450
@LopezCastroRoberto made their first contribution in #2303
@huangzhilin-hzl made their first contribution in #2461
@zack041 made their first contribution in #2442

Full Changelog: v0.6.2...v0.6.3

flashinfer-ai/flashinfer v0.6.3 Release v0.6.3 on GitHub

What's Changed

New Contributors

flashinfer-ai/flashinfer v0.6.3
Release v0.6.3

on GitHub