Highlights
-
Model Support
-
API
-
Feature
- Add CuTe DSL single-pass multi-CTA cluster top-k (#12354)
- Account for reusable KV cache blocks in micro-batch scheduler capacity scheduling (#11637)
- Add raster-along-M/N support for blockscaled contiguous backbone kernels in CuteDSL MoE (#12079)
- Add stride support for
conv1dandfused_sigmoid_gating_delta_rule_update(#12442) - Add a safe allgather implementation with chunking (#12174)
- Add dynamic SMEM block routing in MoE (#12456)
- Optimize
mamba_mixer2.pydecode performance (#11843) - Add PDL support to CuTE DSL top-k kernels (#12506)
- Add FlexKV support (#12512)
- Add a KV cache-aware ADP router for prefix-affinity request routing (#12315)
-
Fix
- Fix KV token estimation when ADP is enabled (#12099)
- Fix Eagle MLA target with GQA draft support (#12171)
- Fix Qwen 3.5 3D position ID handling (#12114)
- Switch tests to
TorchSamplerand fix related bugs (#12200) - Use
ceil_divfor head and size sharding (#12441) - Remove redundant D2H synchronization to improve performance (#12445)
- Fix parallel WAN VAE when
return_dict=True(#12460) - Fix Triton resmooth kernel crashes on SM100f for large MoE grids (#12397)
- Use a model-level warmup cache key for visual generation pipelines (#12516)
- Add NVTX annotations in
sampler.py(#12459) - Use
extra_visual_gen_optionsto improve visual generation routing (#12487)
-
Documentation
-
Test & Infra
- Save unittest subtest results periodically (#11850)
- Fix the B200 aggregated CI perf test MPI issue (#12347)
- Fix LoRA config handling when the provided config count is below requirements (#12409)
- Add a unit test for
load_state_dictsafetensors fallback (#12408) - Replace the skipped TRTLLM NVFP4 test in the B300 CI list (#12454)
- Fix the ltx-2 model checkpoint issue in VBench eval tests (#12463)
- Fix the concurrent write issue in perf tests (#12484)
- Update dependencies to align with the NGC PyTorch 26.02 stack (#12102)
- Consolidate PyTransceiver code (#12342)
- Add Eagle coverage with different input/output cases on Spark (#12520)
What's Changed
- [None][infra] Waive 4 failed cases for main in post-merge 2611 by @ZhanruiSunCh in #12433
- [None][test] Fix lora config less than required config number by @yufeiwu-nv in #12409
- [https://nvbugs/5916151][fix] Unwaive test_fused_moe_w4a8_nvfp4_fp8[TRTLLM] by @xxi-nv in #12400
- [https://nvbugs/5963423][fix] Fix kv token estimation when ADP is on. by @dominicshanshan in #12099
- [TRTLLM-11229][infra] Save unittest subtest results periodically by @yiqingy0 in #11850
- [None][chore] Add failed cases into waives.txt by @xinhe-nv in #12426
- [https://nvbugs/5997090][fix] Fix B200 Aggregated CI Perf Test MPI Issue by @chenfeiz0326 in #12347
- [TRTLLM-10407][perf] Add cute dsl single pass multi cta cluster topk by @limin2021 in #12354
- [TRTLLM-11070][feat] Account for reusable KV cache blocks in micro batch scheduler capacity scheduling. by @SimengLiu-nv in #11637
- [None][chore] Fixing guardword check by @pcastonguay in #12455
- [None][infra] Waive 1 failed cases for main in post-merge 2610 by @ZhanruiSunCh in #12434
- [None][feat] CuteDSL MOE: Add raster along M/N support for blockscaled contiguous backbone kernel by @liyuhannnnn in #12079
- [None][fix] Switch tests to TorchSampler and fix bugs by @Funatiq in #12200
- [TRTLLM-10061][fix] Use ceil_div for head/size calculations by @VALLIS-NERIA in #12441
- [TRTLLM-10061][feat] Add stride support for conv1d and fused_sigmoid_gating_delta_rule_update by @VALLIS-NERIA in #12442
- [None][fix] Eagle: MLA Target + GQA Draft by @IzzyPutterman in #12171
- [None][doc] fix outdated code references in tech blogs 2, 3, 4, 8, 9, 11 by @schetlur-nv in #12338
- [TRTLLM-11471][feat] Add safe version of allgather with chunking by @chienchunhung in #12174
- [None][perf] add Dynamic SMEM block routing in MOE by @jiahanc in #12456
- [TRTLLM-11544][feat] Add Qwen 3.5 supporting(NVFP4). by @nv-guomingz in #12302
- [https://nvbugs/5997090][fix] Add Disagg Perf Test back as MPI Issue has been fixed by @chenfeiz0326 in #12458
- [https://nvbugs/5841976][fix] Remove test_fused_moe_alltoall_fp4[DeepEP] from waives by @xxi-nv in #12405
- [None][infra] Waive 2 failed cases for main in post-merge 2613 by @ZhanruiSunCh in #12473
- [https://nvbugs/5866619][test] Add unit test for load_state_dict safetensors fallback by @crazydemo in #12408
- [None][feat] Fuse all_reduce with norm for nemotron_h models by @Wanli-Jiang in #12410
- [None][infra] Update CI allowed list by @yuanjingx87 in #12488
- [https://nvbugs/6013562][test] Update waive by @xinhe-nv in #12492
- [None][feat] Small optimizations for mamba_mixer2.py decode by @hnover-nv in #11843
- [None][infra] Waive flaky DeepSeekV3Lite disagg serving test by @hyukn in #12494
- [#11526][chore] AutoDeploy accuracy tests: Use Llama3.1-8B-Instruct official checkpoints by @galagam in #12285
- [https://nvbugs/6007285][fix] Replace skipped TRTLLM NVFP4 test in B300 CI list by @xxi-nv in #12454
- [https://nvbugs/5983390][fix] Remove redundant D2H sync to optimize perf by @hyukn in #12445
- [https://nvbugs/5987470][fix] BREAKING: Do not normalize log probs by default by @achartier in #12366
- [TRTLLM-11622][fix] fix parallel WAN vae when return_dict=True by @NVShreyas in #12460
- [None][infra] Waive pre-merge failed 5090 test by @yuanjingx87 in #12486
- [None][infra] Waive flaky DeepSeekV3Lite disagg serving test by @bo-nv in #12518
- [None][chore] Fix ltx-2 Model Checkpoint Issue in VBench Eval Tests by @yibinl-nvidia in #12463
- [https://nvbugs/5962591][fix] Fix Triton resmooth kernel crash on SM100f for large MoE grids by @Barry-Delaney in #12397
- [None][chore] Add failed cases into waives.txt by @xinhe-nv in #12495
- [None][doc] Document temperature-adjusted logprobs in TRT backend by @achartier in #12514
- [None][feat] Add PDL support to CuTE DSL top-k kernels by @limin2021 in #12506
- [None][infra] Waive 4 failed cases for main in post-merge 2617 by @ZhanruiSunCh in #12536
- [None][doc] Update Python coding guidelines. by @hnover-nv in #12439
- [#12290][fix] Qwen 3.5 fix 3d position ID handling by @bmarimuthu-nv in #12114
- [TRTLLM-10820][infra] Update dependencies to align with NGC PyTorch 26.02 stack by @EmmaQiaoCh in #12102
- [https://nvbugs/6015329][fix] Use model-level warmup cache key for visual gen pipelines by @karljang in #12516
- [TRTLLM-9523][chore] PyTransceiver code consolidation by @Shixiaowei02 in #12342
- [None][test] Add different input-output of eagle cases on Spark by @JennyLiu-nv in #12520
- [https://nvbugs/6011086][fix] Fix Perf Test's Concurrent Write Issue by @chenfeiz0326 in #12484
- [None][fix] NVTX annotation in sampler.py by @ixlmar in #12459
- [https://nvbugs/5998489][feat] Adding support for request priority in LLM API by @pcastonguay in #12362
- [None][feat] Add support for FlexKV by @pcastonguay in #12512
- [None][feat] KV cache-aware ADP router for prefix-affinity request routing by @lancelly in #12315
- [https://nvbugs/6008183][fix] Use extra_visual_gen_options to help de… by @JunyiXu-nv in #12487
- [None][test] Waive a flaky test case on Dis-agg serving with Nemotron… by @nv-guomingz in #12578
- [None][chore] Bump version to 1.3.0rc10 by @yuanjingx87 in #12511
- [None][chore] Fixing guardword check by @VALLIS-NERIA in #12579
Full Changelog: v1.3.0rc9...v1.3.0rc10