Highlights
-
Model Support
-
API
-
Feature
- Update disagg slurm scripts (#10712)
- Re-implement MicroBatchScheduler and CapacityScheduler in Python (#10273)
- Fix sharding dashboard errors (#10786)
- Async Transfer Manager (#9891)
- Speculative One Model: FlashInfer sampling (#10284)
- Refactor speculative decoding workers (#10768)
- Use global unique id as disagg request id (#10187)
- Enable guided decoding with reasoning parsers (#10890)
- Support partial update weight for fp8 (#10456)
- Multi-LoRA serving with CUDA Graph (#8279)
- Support logprobs for Completions API (#10809)
- Eagle3 Specdec UX improvements (#10124)
- Python transceiver components (step 2) (#10494)
- Upgrade NIXL to v0.9.0 (#10896)
- KV Connector Support for MTP (#10932)
- Support overlap scheduler for disagg ctx instances (#10755)
- Adding implementation of KVCacheManagerV2 (#10736)
- Switch to ConfigurableMoE as the default path (#10792)
-
Fix
- Enable system memory to transfer active message in NIXL ucx (#10602)
- Fix the potential misaligned access due to vectorized ld/st instructions in NVLinkOneSided A2A (#10539)
- Default disable gemm+allreduce fusion (#10656)
- Fix vulnerability urllib3 and nbconvert (#10551)
- Fix overlap scheduler race condition (#10610)
- Replace pickle.load with restricted Unpickler (#10622)
- Fix copy start_logs in disagg slurm scripts (#10840)
- Cherry-pick: Disable short profile for tunable ops with MERGE strategy (#10844, #10715)
- Lock resource to fix potential access to released data (#10827)
- Cherry-pick: Fix accuracy issue of TWO-SHOT AllReduce kernel (#10841, #10654)
- Remove weight tensor holder to release memory earlier (#10876)
- Add missing dist strategy param and fix typo for ad_logger (#10892)
- Update RMSNorm custom op plumbing (#10843)
- Fix hmac launch (#10434)
- Avoid Double update for previous batch (#9888)
- Re-init TRTLLM sampler to use sample stream in multi-stream cases (#10918)
- Mtp with async scheduler (#10941)
- Fix buffer reuse (#10716)
- Cherry-pick: Fix hanging issue for MNNVL Allreduce under PP (#10750, #10633)
- Workaround for flashinfer.sampling.sampling_from_logits (#10713)
- Fix port 8000 being used issue in stress test (#10756)
-
Documentation
-
Test & Infra
- Upload regression info to artifactory (#10599)
- Add sonarqube scanning in lockfile generation pipeline (#10700)
- Add Nemotron Nano v3 FP8 autodeploy perf test (#10603)
- Remove trt flow tests in NIM (#10731)
- Update config.yaml of slurm scripts to align with submit.py change (#10802)
- Add a timeout in MNNVL throughput to prevent hangs if one rank crashes (#9532)
- Trigger multi-gpu tests when install_nixl/ucx.sh is modified (#10624)
- Add DGX-Spark VLM accuracy and perf spec dec cases (#10804)
- Fix test list llm_spark_func.txt (#10921)
- Add test configurable moe module multi gpu (#10699)
- NVFP4 MoE - Move weights transformation to fusion phase (#10803)
- Update flashinfer-python to 0.6.1 (#10872)
- Improve disagg acc tests (#10833)
- Refine placement group in ray executor (#10235)
- Regenerate out dated lock file (#10940)
- Remove long-running sanity check tests on GH200 (#10924, #10969)
- Add dgx-spark beta notes (#10766)
- Modify ctx config in 128k8k disagg cases (#10779)
- Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark (#10279)
What's Changed
- [#10696][fix] AutoDeploy prevent torch.export from specializing batch dimension when max_batch_size=1 by @MrGeva in #10697
- [None][infra] Add sonarqube scanning in lockfile generation pipeline by @yuanjingx87 in #10700
- [https://nvbugs/5769712][fix] fix timeout in AutoDeploy llama accuracy test by @lucaslie in #10461
- [#10688][fix] AutoDeploy Fix CUDA graph batch sizes exceeding max_batch_size by @MrGeva in #10687
- [#10642][feat] AutoDeploy: optimized canonicalize_graph utilities [1/2] by @lucaslie in #10675
- [https://nvbugs/5769890][fix] enable system memory to transfer active message in NIXL ucx by @chuangz0 in #10602
- [https://nvbugs/5814247][fix] unwaive AutoDeploy multi-gpu unit tests by @lucaslie in #10769
- [TRTLLM-10300][feat] Upload regression info to artifactory by @chenfeiz0326 in #10599
- [None][chore] Add release/1.2 branch into lockfile generation schedule by @yiqingy0 in #10790
- [TRTLLM-9581][infra] Use /home/scratch.trt_llm_data_ci in computelab by @ZhanruiSunCh in #10616
- [None][infra] Waive failed cases for main on 01/19 by @EmmaQiaoCh in #10794
- [#10607][chore] Add Nemotron Nano v3 FP8 autodeploy perf test by @MrGeva in #10603
- [None][feat] Update disagg slurm scripts by @qiaoxj07 in #10712
- [None][test] adjust the dis-agg test timeout threshold by @Shixiaowei02 in #10800
- [None][chore] docs: clarify LoRA is not supported with --use_fp8_rowwise in Fp8RowwiseAttention (see #2603) by @ssam18 in #10320
- [None][chore] Remove trt flow tests in NIM by @jieli-matrix in #10731
- [None][chore] update config.yaml of slurm scripts to align with submit.py change by @dc3671 in #10802
- [https://nvbugs/5776445][chore] unwaive test by @reasonsolo in #10667
- [TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python by @lancelly in #10273
- [TRTLLM-10296][fix] Fix the potential misaligned access due to vectorized ld/st instructions in NVLinkOneSided A2A. by @bobboli in #10539
- [None][chore] Add failed cases into waives.txt by @xinhe-nv in #10776
- [None][fix] default disable gemm+allreduce fusion by @benzh-2025 in #10656
- [TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #10787
- [None][fix] Fix vulnerability urllib3 and nbconvert by @yiqingy0 in #10551
- [None][test] Update sanity test list by @xinhe-nv in #10825
- [None][fix] Remove unused params in attn by @yizhang-nv in #10652
- [TRTLLM-10785][feat] Fix sharding dashboard errors by @greg-kwasniewski1 in #10786
- [https://nvbugs/5701445][chore] unwaive test. by @yuxianq in #10806
- [None][infra] trigger multi-gpu tests when install_nixl/ucx.sh is mod… by @bo-nv in #10624
- [None][infra] Waive failed cases for main branch on 01/20 by @EmmaQiaoCh in #10829
- [None][chore] Reduce tedious logs by @chzblych in #10847
- [#10707][fix] AutoDeploy: Super accuracy test fixes by @galagam in #10717
- [None][chore] Async Transfer Manager by @jthomson04 in #9891
- [None][fix] fix duplicate entry in waives.txt by @lucaslie in #10853
- [None][feat] Speculative One Model: FlashInfer sampling by @IzzyPutterman in #10284
- [https://nvbugs/5670108][fix] Fix overlap scheduler race condition in… by @SimengLiu-nv in #10610
- [https://nvbugs/5760737][test] only skip mooncake+indexerkcache test by @zhengd-nv in #10266
- [https://nvbugs/5759698][fix] unwaive test_base_worker by @Superjomn in #10669
- [None][fix] Add a timeout in MNNVL throughput to prevent hangs if one rank crashes by @djns99 in #9532
- [https://nvbugs/5670458][chore] Unwaive reward model test by @shuyixiong in #10831
- [None][chore] Revert #10847 by @chzblych in #10869
- [https://nvbugs/5775021] [fix] Replace pickle.load with restricted Unpickler by @yibinl-nvidia in #10622
- [None][fix] Fix copy start_logs in disagg slurm scripts by @qiaoxj07 in #10840
- [None][fix] Cherry-pick #10715: Disable short profile for tunable ops with MERGE strategy by @hyukn in #10844
- [https://nvbugs/5740377][fix] Lock resource to fix potential access to released data by @HuiGao-NV in #10827
- [https://nvbugs/5814253][fix] unwaive test_autotuner_distributed_strategy tests by @hyukn in #10793
- [None][chore] switch to ConfigurableMoE as the default path by @xxi-nv in #10792
- [None][infra] Waive failed cases for main branch on 01/21 by @EmmaQiaoCh in #10882
- [https://nvbugs/5636916][fix] Cherry-pick #10654: Fix accuracy issue of TWO-SHOT AllReduce kernel by @hyukn in #10841
- [None][chore] unwaive qwen3 235B accuracy test by @kris1025 in #10493
- [TRTLLM-10325][feat] Refactor speculative decoding workers by @cascade812 in #10768
- [None][infra] Fix sonarQube job hang by create jenkins homd folder if not exist by @yuanjingx87 in #10830
- [https://nvbugs/5816267][fix] Remove weight tensor holder to release memory earlier by @dongxuy04 in #10876
- [https://nvbugs/5784543][chore] unwaive test. by @yuxianq in #10835
- [None][feat] GLM-4.5-Air support by @videodanchik in #10653
- [TRTLLM-10059][feat] Use global unique id as disagg request id by @reasonsolo in #10187
- [None][chore] Add DGX-Spark VLM accuracy and perf spec dec cases by @JennyLiu-nv in #10804
- [None][feat] K-EXAONE MTP support by @yechank-nvidia in #10796
- [#8241][feat] Support model_kwargs for pytorch backend by @taylor-yb-lee in #10351
- [TRTLLM-10154][feat] Enable guided decoding with reasoning parsers by @syuoni in #10890
- [None][fix] Fix waived tests for Nemotron-h models by @Wanli-Jiang in #10758
- [TRTLLM-9771][feat] Support partial update weight for fp8 by @shuyixiong in #10456
- [None][feat] Add KV cache cleanup by @pengbowang-nv in #7439
- [https://nvbugs/5811159][fix] Unwaive bug 5811159. by @bobboli in #10903
- [#10838][fix] Add missing dist strategy param. fix typo for ad_logger… by @tcherckez-nvidia in #10892
- [None][ci] Fix test list llm_spark_func.txt by @syuoni in #10921
- [None][chore] Bump version to 1.3.0rc1 by @yiqingy0 in #10923
- [None][chore] NVFP4 MoE - Move weights transformation to fusion phase… by @tcherckez-nvidia in #10803
- [https://nvbugs/5741304][chore] Update flashinfer-python to 0.6.1 by @yihwang-nv in #10872
- [https://nvbugs/5322131][feat] Multi-LoRA serving with CUDA Graph by @JyChang012 in #8279
- [None][fix] Update RMSNorm custom op plumbing by @JintaoPengCS in #10843
- [TRTLLM-10388][feat] Support logprobs for Completions API by @LinPoly in #10809
- [https://nvbugs/5768068][chore] improve disagg acc tests by @bo-nv in #10833
- [https://nvbugs/5783876][fix] fix hmac launch by @Superjomn in #10434
- [TRTLLM-10590][feat] Eagle3 Specdec UX improvements by @venkywonka in #10124
- [TRTLLM-9527][doc] Add NIXL as a Python attribution (step 2) by @Shixiaowei02 in #10910
- [TRTLLM-9527][feat] Python transceiver components (step 2) by @Shixiaowei02 in #10494
- [None][fix] Avoid Double update for previous batch by @yizhang-nv in #9888
- [https://nvbugs/5819002][fix] fix sharding tests by @greg-kwasniewski1 in #10775
- [#9306][refactor] Refactor AutoDeployConfig into LlmArgs by @2ez4bz in #10613
- [https://nvbugs/5688721][fix] unwaive NemotronH accuracy test by @lucaslie in #10852
- [None][infra] Update CI allowlist by @yuanjingx87 in #10936
- [TRTLLM-9108][feat] Add test configurable moe module multi gpu by @leslie-fang25 in #10699
- [None][test] Remove unused test list by @StanleySun639 in #10916
- [None][feat] Upgrade NIXL to v0.9.0 by @zackyoray in #10896
- [None][infra] Waive a failed case in pre-merge stage by @EmmaQiaoCh in #10948
- [https://nvbugs/5833795][chore] Waive test test_e2e.py::test_ptp_quickstart_advanced[GPT-OSS-120B-gpt_oss/gpt-oss-120b] by @yihwang-nv in #10953
- [None][chore] refine placement group in ray executor by @Superjomn in #10235
- [https://nvbugs/5814215][fix] Unwaive test_trtllm_flashinfer_symbol_collision.py::test_flashinfer_fused_moe_matches_torch_moe by @yihwang-nv in #10930
- [None][infra] Regenerate out dated lock file by @yuanjingx87 in #10940
- [https://nvbugs/5707359][fix] Unwaive the test that due to flashinfer… by @liji-nv in #10570
- [None][feat] AutoDeploy: Enhance memory consumption for MoE fusion transform by @taylor-yb-lee in #10772
- [None][feat] KV Connector Support for MTP by @jthomson04 in #10932
- [TRTLLM-10334] [feat] Support overlap scheduler for disagg ctx instances by @kaiyux in #10755
- [None][ci] Remove long-running sanity check tests on GH200 (#10924) by @chzblych in #10969
- [None][infra] Fix TRT-LLM data scratch mount point for gb10x by @EmmaQiaoCh in #10880
- [https://nvbugs/5829097][fix] Re-init TRTLLM sampler to use sample stream in multi-stream cases. by @yuxianq in #10918
- [TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 by @lowsfer in #10736
- [None][fix] Bugfix/mtp with async scheduler by @pcastonguay in #10941
- [None][chroe] Mass integration of release/1.2 by @dominicshanshan in #10888
- [TRTLLM-10147][perf] Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark by @syuoni in #10279
- [None][test] Waive failed tests on main 1/25 by @chzblych in #10984
New Contributors
- @ssam18 made their first contribution in #10320
- @cascade812 made their first contribution in #10768
- @videodanchik made their first contribution in #10653
- @taylor-yb-lee made their first contribution in #10351
- @JyChang012 made their first contribution in #8279
Full Changelog: v1.3.0rc0...v1.3.0rc1