NVIDIA/TensorRT-LLM v1.2.0rc7 on GitHub

Security Vulnerabilities

GnuPG Vulnerability

A security vulnerability has been identified in GnuPG versions prior to 2.4.9, which is present in the Ubuntu 24.04 LTS utilized by the TensorRT LLM base image. For details regarding this vulnerability, please refer to the official Ubuntu advisory: CVE-2025-68973. An official patched package for the Ubuntu system is currently pending. The fix will be included in the next release once the updated package is published and incorporated. To mitigate potential risks immediately, users are advised to manually upgrade GnuPG to version 2.4.9 or later.

Hugging Face Transformers Vulnerabilities

Several security vulnerabilities have been disclosed regarding the Hugging Face Transformers library used in TensorRT LLM. As these issues originate from an upstream dependency, remediation is dependent on the release of a patch by the Hugging Face team. We are actively monitoring the situation and will update TensorRT LLM to include the necessary fixes once a stable release of the Transformers library addressing these vulnerabilities becomes available. Affected CVEs: CVE-2025-14920, CVE-2025-14921, CVE-2025-14924, CVE-2025-14927, CVE-2025-14928, CVE-2025-14929, CVE-2025-14930

Highlights

Model Support
- Add Qwen3-VL-MoE (#9689)
- Support DeepSeek-V32 chat template (#9814)
- Support DeepSeek-V3.2, R1 and V3.1 tool parser (#10126, #10010)
- Support Eagle3 on Mistral Large3 (#9971)
- Support VLM part for Mistral Large 3 (#10188)
- Support multi-gpu running for nemotron-v3-nano and super (#10118)
- Support Qwen3-VL dense model in pytorch backend (#9060)
- Support NVFP4 for gptoss (#8956)
- Add MLA Based Eagle (#9677)
API
- Migrate model registry to v2.0 format with composable configs (#9836)
- Support multiple post process for Responses API (#9908)
- Allow YAML config overwriting CLI args for trtllm-eval (#10296)
- Standardize MoE weights interface (#10295)
- Expose enable_trt_overlap in Triton backend (#10018)
Feature
- Support NVFP4 weight and weight_scale padding for MoE cutlass (#9358)
- Add routing support for the new model for cutlass and TRTLLM MoE backend (#9792)
- Improve disagg-server prometheus metrics and synchronize dynamic workers’ clocks (#9726)
- Update TRT-LLM Gen MoE for NvFp4 + bias with tileN=256 (#9734)
- Add optimization options for MOE CuteDSL finalized kernel (#10042)
- Add fp8 bmm on sm120 (#9687)
- Reuse alltoall workspace for CuteDSL MoE output (#9840)
- Support Mooncake transfer engine as cache transceiver backend (#8309)
- Enable KV cache reuse for config database (#10094)
- Enable PDL for CuteDSL kernels and overlap MoeOutputMemset (#10043)
- Cudagraph updates for helix parallelism (#10141)
- Custom AllToAll for helix parallelism (#9986)
- Pass MRoPE tensors for EPD disagg (#9758)
- Reuse previous draft requests if possible (#10263)
- Make PDL enabled by default (#9695)
- Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations (#10201)
- Provide attention NVFP4 out support for torch compile (#9740)
- Increase topk upper limit to 22 for NVLinkOneSided AlltoAll (#10229)
- Deliver optimizations for two-model speculative decoding (#10208)
Fix
- Fix PDL bug in trtllm-gen FMHA kernels (#9913)
- Fix Illegal Memory Access for CuteDSL Grouped GEMM (#10008)
- Disable tvm_ffi for CuteDSL nvFP4 dense GEMM (#10040)
- Fix ready signal in NIXL backend (#10000)
- Fix top_k=10 in NVLinkOneSided AlltoAll (#10197)
- Fix race conditions in KV cache communication during unexpected termination (#10076)
- Fix deepseek sharding (#9984)
- Fix contiguous view usage in load_expert weights (#10136)
- Fix detokenizer issue for DeepSeek-v3.2 (#10106)
- Fix indice offset overflow in custom Top-K kernel and UT (#10027)
- Fix draft_lengths for CUDA graph capture (#10004)
- Fix port conflict handling for CI (#10392, #10175, #10035)
- Fix NVFP4 linear method weight and weight_scale padding (#10148)
- Fix VSWA block store/load scheme in KV cache manager (#10183)
- Fix ready signal and execution_stream synchronization across components (#10060)
- Fix PP+CP combination with helix parallelism (#10312)
- Fix Gemma3 RoPE for local attention (#9961)
- Make NCCL resource manager destructor exception-safe (#10166)
- Fix detokenizer / tokenizer issues (use local tokenizer, cache vocab) (#10230, #10219)
- Disable PDL for quant kernels to address accuracy (#10285)
- Fix hilo: Avoid property with setter in nn modules (#10212)
Documentation
- Add README for Nemotron Nano v3 (#10017)
- Update CONTRIBUTING.md (#10023)
- Update online benchmarking docs (#9611)
- Update Dynamo Example document (#9619, #10368)
- Update Perf_Overview.md with benchmarking results (#9723)
- Add NIXL-Libfabric usage documentation (#10205)
- Add Sparse Attention feature doc (#9648)
- Update IFB performance guide & GPTOSS deployment guide (#10283)
- Promote perfect MoE router feature documentation (#10303)
Test & Infra
- Fix credential loading in lockfile generation pipeline (#10020)
- Add Qwen3-4B-Eagle3 one-model perf test (#10041)
- Add regression testing for config database (#9832)
- Update tests for nemotron_h (#9993)
- Use ucx as default backend (#10101)
- Fix OpenSearch URL in slurm_launch.sh for multinode perf sanity (#9990)
- Remove helix test from RTX test list (#10224)
- Add ray test robustness and RL perf reproduce script (#9939)
- Support multi-node disagg perf test in CI (#9138)
- Enable single-gpu CI on spark (#9304)
- Add disaggregated stress test (#9354)
- Include LongBenchV1 in trtllm-eval (eval infra aspect) (#10265)
- Fix port conflict avoidance in CI via get_free_port_in_ci (#10392)

Full Changelog: v1.2.0rc7...v1.2.0rc7

What's Changed

[https://nvbugs/5708810][fix] Fix TRTLLMSampler by @moraxu in #9710
[TRTLLM-9641][infra] Use public triton 3.5.0 in SBSA by @ZhanruiSunCh in #9652
[TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #9979
[TRTLLM-9794][ci] move more test cases to gb200 by @QiJune in #9994
[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend by @ChristinaZ in #9792
[TRTLLM-8310][feat] Add Qwen3-VL-MoE by @yechank-nvidia in #9689
[https://nvbugs/5731717][fix] fixed flashinfer build race condition during test by @MrGeva in #9983
[FMDL-1222][feat] Support weight and weight_scale padding for NVFP4 MoE cutlass by @Wanli-Jiang in #9358
[None][chore] Update internal_cutlass_kernels artifacts by @yihwang-nv in #9992
[None][docs] Add README for Nemotron Nano v3 by @2ez4bz in #10017
[None][infra] Fixing credential loading in lockfile generation pipeline by @yuanjingx87 in #10020
[https://nvbugs/5727952][fix] a pdl bug in trtllm-gen fmha kernels by @PerkzZheng in #9913
[None][infra] Waive failed test for main branch on 12/16 by @EmmaQiaoCh in #10029
[None][doc] Update CONTRIBUTING.md by @syuoni in #10023
[None][fix] Fix Illegal Memory Access for CuteDSL Grouped GEMM by @syuoni in #10008
[TRTLLM-9181][feat] improve disagg-server prometheus metrics; synchronize workers' clocks when workers are dynamic by @reasonsolo in #9726
[None][chore] Final mass integration of release/1.1 by @mikeiovine in #9960
[None][fix] Fix iteration stats for spec-dec by @achartier in #9855
[https://nvbugs/5741060][fix] Fix pg op test by @shuyixiong in #9989
[https://nvbugs/5635153][chore] Remove responses tests from waive list by @JunyiXu-nv in #10026
[None] [feat] Enhancements to slurm scripts by @kaiyux in #10031
[None][infra] Waive failed tests due to llm model files by @EmmaQiaoCh in #10068
[None][fix] Enabled simultaneous support for low-precision combine and MTP. by @yilin-void in #9091
[https://nvbugs/5698434][test] Add Qwen3-4B-Eagle3 One-model perf test by @yufeiwu-nv in #10041
[TRTLLM-9998][fix] Change trtllm-gen MoE distributed tuning strategy back to INDEPENDENT by @hyukn in #10036
[TRTLLM-9989][fix] Disable tvm_ffi for CuteDSL nvFP4 dense GEMM. by @hyukn in #10040
[None][chore] Remove unnecessary warning log for tuning. by @hyukn in #10077
[TRTLLM-9680][perf] Optimize TRTLLMSampler log_probs performance (Core fix has been merged via #9353) by @tongyuantongyu in #9655
[None][chore] Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #10045
[None][fix] Autodeploy: fix some legacy flashinfer attention test errors by @nvchenghaoz in #9928
[None][fix] Revert GHA upgrade for blossom-ci workflow by @tburt-nv in #10095
[None][chore] Clarify copyright header guidance by @tburt-nv in #9882
[TRTLLM-9381][feat] Add kimi k2 fp4 tests by @xinhe-nv in #9906
[None][chore] Add failed cases into waives.txt by @xinhe-nv in #10025
[#7532][feat] AutoDeploy: gather logits before lm head by @lucaslie in #9962
[https://nvbugs/5721644][fix] Update tests for nemotron_h by @Wanli-Jiang in #9993
[None][infra] Update allowlist 2025.12.17 by @yuanjingx87 in #10097
[None][fix] avoid ID conversion for non enable_configurable_moe cases. by @yuxianq in #10003
[None][infra] Waive failed cases for main branch on 12/18 by @EmmaQiaoCh in #10105
[https://nvbugs/5753250][infra] Waive _test_openai_responses. by @bobboli in #10110
[None][infra] Fix slurm job does not catch cancelled jobs by @yuanjingx87 in #9722
[None][feat] update TRT-LLM Gen MoE for NvFp4 + bias with tileN=256 by @nekorobov in #9734
[None][perf] Add more optimization options for MOE CuteDSL finalized kernel by @sherry-1001 in #10042
[https://nvbugs/5456493][feat] Add fp8 bmm on sm120 by @CarstyYou in #9687
[TRTLLM-9819][perf] Reuse alltoall workspace for CuteDSL MoE output by @syuoni in #9840
[https://nvbugs/5726066][fix] fix auto-scaling related failures by @reasonsolo in #9845
[https://nvbugs/5747911][fix] Use offline data path for the unit test of mmencoder server by @chang-l in #10135
[https://nvbugs/5741331][fix] Fix helix accuracy test by @brb-nv in #10021
[TRTC-71][feat] Add regression testing for config database by @anish-shanbhag in #9832
[https://nvbugs/5721912][chore] Unwaive the test by @hyukn in #10108
[None][infra] Fix issue that lock file geneartion will skip dependency with comment by @yuanjingx87 in #10144
[None][fix] Fix ready signal in NIXL backend by @chuangz0 in #10000
[None][feat] Support Mooncake transfer engine as a cache transceiver backend by @wjueyao in #8309
[TRTLLM-9840][test] switch ucx backend to default backend by @crazydemo in #10101
[None][chore] Waive test blocking pre-merge 12/18 by @brb-nv in #10145
[#9230][refactor] Replace nemotron patches with custom model implementation by @2ez4bz in #9751
[None][chore] Update CODEOWNERS for test cases and test list by @LarryXFly in #10119
[TRTLLM-7736][feat] Incrementally update the inputs of target and draft models by @ziyixiong-nv in #9708
[https://nvbugs/5722653][fix] Address port conflict by assigning different port section in the same node. by @JunyiXu-nv in #10035
[TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #10132
[TRTLLM-8830][test] Overlap scheduler enhancement perf test: Add qwen3_0,8b and llama3.1 test cases by @yufeiwu-nv in #10114
[TRTLLM-9654][feat] Support DeepSeek-V32 chat template by @chang-l in #9814
[TRTLLM-9604][feat] DS R1 & V3.1 tool parser by @LinPoly in #10010
[None][infra] Update waive and waive failed tests for main branch on 12/19 by @EmmaQiaoCh in #10151
[TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #10129
[#9640][feat] Migrate model registry to v2.0 format with composable configs by @tcherckez-nvidia in #9836
[None][fix] waive the failed test test_service_discovery[etcd-load_ba… by @xxi-nv in https://github.com//pull/10161
[https://nvbugs/5722653][fix] Unwaive fixed test by @JunyiXu-nv in #10157
[TRTC-102][docs] --extra_llm_api_options->--config in docs/examples/tests by @venkywonka in #10005
[https://nvbugs/5720357][fix] Fix indice offset overflow in custom Top-K kernel and corresponding UT case by @longcheng-nv in #10027
[None][fix] Revert the change and remove device count guard for DSv32 by @chang-l in #9631
[#10056][test] AutoDeploy: Add accuracy test for Nemotron SuperV3 by @galagam in #10131
[None][chore] Waive timing out pre-merge test by @brb-nv in #10167
[None][fix] enable KV cache reuse for config database by @anish-shanbhag in #10094
[None][fix] fix draft_lengths for CUDA graph capture. by @yuxianq in #10004
[https://nvbugs/5747930][fix] Use offline tokenizer for whisper models. by @yuxianq in #10121
[TRTLLM-9992][perf] Enable PDL for CuteDSL kernels and overlap MoeOutputMemset by @syuoni in #10043
[https://nvbugs/5643631][fix] Fix hostfunc seg fault by @syuoni in #10028
[https://nvbugs/5753250][infra] Further waive all tests in _test_openai_responses.py by @bobboli in #10176
[https://nvbugs/5744427][fix] Fix accuracy test OOM by @brb-nv in #10173
[TRTLLM-9805][feat] Skip Softmax Attention. by @bobboli in #9821
[None][chore] removed duplicated test from l0_b200.yml by @MrGeva in #10090
[TRTLLM-9872][fix] clear the failed test at CI when enalbe_configurab… by @xxi-nv in #10067
[None][infra] Waive failed tests for main branch on 12/21 by @EmmaQiaoCh in #10184
[None] [feat] Enhancements to slurm scripts by @kaiyux in #10112
[None][feat] Support Eagle3 on Mistral Large3 by @byshiue in #9971
[https://nvbugs/5702793][fix] Fix view operation on uncontiguous tensor by @shuyixiong in #10147
[None][feat] Cudagraph updates for helix parallelism by @brb-nv in #10141
[None][ci] Waive GPTOSS test case by @syuoni in #10155
[https://nvbugs/5701457][fix] Unwaive ray test. by @dominicshanshan in #10175
[None][fix] disable cuda ipc on device without nvlink (L40s) for disagg test by @chuangz0 in #9735
[https://nvbugs/5701445][chore] unwaive test. by @yuxianq in #9949
[TRTLLM-9880][feat] Include torch compile tests in QA test list by @liji-nv in #10149
[https://nvbugs/5684820][fix] fix the detokenizer issue for DeepSeek-v3.2 by @lfr-0531 in #10106
[https://nvbugs/5666821][chore] unwaive tests. by @yuxianq in #9958
[None][chore] Remove closed bugs by @xinhe-nv in #10182
[None][fix] NVFP4 linear method's weight and weight_scale padding by @JadoTu in #10148
[https://nvbugs/5762016][chore] Skip a ray test by @shuyixiong in #10194
[https://nvbugs/5503479][fix] update trtllm-gen kernels to address few bugs by @PerkzZheng in #10089
[None][refactor] simplify get_stats and get_kvcache_events with rpc by @Superjomn in #9980
[None][chore] NVLinkOneSided AlltoAll Support zero local_num_tokens. by @bobboli in #9822
[TRTLLM-9409][feat] Pass MRoPE tensors for EPD disagg by @2ez4bz in #9758
[None][chore] Remove logprobs constraint on trtllm-serve pytorch backend by @LinPoly in #9911
[None][infra] Waive failed cases on 12/22 by @EmmaQiaoCh in #10200
[TRTLLM-7906][feat] Support multiple post process for Responses API by @JunyiXu-nv in #9908
[#9717][chore] Refactor MoE code to use enums by @tcherckez-nvidia in #9910
[TRTLLM-9847][fix] WAR fix hanging fused allreduce. by @greg-kwasniewski1 in #10087
[TRTLLM-9677][feat] Support DeepSeek-V3.2 tool parser by @lfr-0531 in #10126
[https://nvbugs/5747674][fix] Add contiguous() before view() in load_expert_w3_w1_weight and load by @farazkh80 in #10136
[TRTLLM-9432][feat] Reduce synchronization and recompilation for qwen3-next by @yuantailing in #9691
[None][fix] avoid implicit cudaStreamSynchronize in sample_async. by @yuxianq in #10120
[TRTLLM-9989][fix] Fix tvm_ffi aaarch64 issue. by @limin2021 in #10199
[None][chore] Fix GB300 support issues by @fredricz-20070104 in #10196
[None][test] Add qa tests for RTX 6K by @pamelap-nvidia in #10210
[TRTLLM-9091] [feat] Replace GenAI-Perf with AIPerf by @lkomali in #9310
[None][chore] Add failed cases into waives.txt by @xinhe-nv in #10177
[https://nvbugs/5741884][fix] unwaive disagg sampler by @chuangz0 in #10189
[None][chore] Bump version to 1.2.0rc7 by @yiqingy0 in #10216
[None][fix] Fix the bug for top_k=10 in NVLinkOneSided AlltoAll. by @bobboli in #10197
[None][fix] Add OpenSearch URL in slurm_launch.sh for Multinode Perf Sanity Test by @chenfeiz0326 in #9990
[https://nvbugs/5729697][fix] MNNVL Allreduce: use CUDA runtime instead of Macro to get SM version. by @timlee0212 in #10062
[None][chore] Remove helix test from rtx test list by @brb-nv in #10224
[https://nvbugs/5764627][chore] waive the time-out test by @hyukn in #10222
[None] [feat] skip batch_tokenize_prompts in CustomDataset by @qiaoxj07 in #10214
[https://nvbugs/5702786][fix] Fix race conditions in KV cache communication during unexpected termination by @RoeyAzran1992 in #10076
[None][chore] Update AD coverage to use torch-cudagraph by @tcherckez-nvidia in #10233
[None][infra] Waive flaky unittest/executor/test_rpc_proxy.py and unittest/executor/test_rpc_worker.py tests by @dongfengy in #10209
[None][infra] Waive failed cases on 12/23 by @EmmaQiaoCh in #10236
[TRTLLM-9565][fix] Fix deepseek sharding by @greg-kwasniewski1 in #9984
[https://nvbugs/5680312][fix] Updated test waiving by @greg-kwasniewski1 in #9630
[None][feat] Expose enable_trt_overlap in Triton_backend brings 1.05x OTPS by @jhaotingc in #10018
[TRTLLM-9493][feat] Custom AllToAll for helix parallelism by @brb-nv in #9986
[None][chore] Add failed cases into waives.txt by @xinhe-nv in #10204
[None][infra] Waive failed cases om 12/24 by @EmmaQiaoCh in #10257
[None][docs] Add NIXL-Libfabric Usage to Documentation by @zackyoray in #10205
[TRTLLM-9798][feat] Change to use new DeepGEMM MQA sm100 kernel for MTP-3 by @lfr-0531 in #10226
[TRTLLM-9615][feat] Support synchronization through PP ranks in the distributed tuning system by @hyukn in #10011
[None][chore] Add failed cases into waives.txt by @xinhe-nv in #10240
[TRTLLM-9737][chore] Add rl perf reproduce script and enhance the robustness of Ray tests by @shuyixiong in #9939
[None][chore] Update tinygemm kernel name by @longlee0622 in #10248
[#10246][feature] Move AD dashboard to use cudagraph compile backend by @tcherckez-nvidia in #10267
[None][infra] Check GB200 coherent GPU mapping by @yiqingy0 in #10253
[None][test] Add disag-serving auto scaling qa test by @StanleySun639 in #10262
[#10052][feat] AutoDeploy enable cudagraphs for flashinfer BatchDecode by @suyoggupta in #10193
[None][fix] fix: resolve GPU memory imbalance in concurrent weight loading by @Nekofish-L in #6472
[#10137][feat] AutoDeploy FP8 MoE refactor by @nzmora-nvidia in #10138
[TRTLLM-10143][feat] Reuse previous draft requests if possible by @ziyixiong-nv in #10263
[TRTLLM-9862][infra] Move single-gpu tests on rtxpro6000d to pre-merge by @EmmaQiaoCh in #9897
[#9241][feat] AutoDeploy: Support Eagle3 Speculative Decoding by @govind-ramnarayan in #9869
[TRTC-121] [feat] Add recipe selector UI to complement the recipe database by @venkywonka in #10125
[None][doc] Add Sparse Attention feature doc by @heyuhhh in #9648
[None][chore] Add failed cases into waives.txt by @xinhe-nv in #10249
[None][feat] Drop non-deepgemm fp8 block scale gemm by @lucifer1004 in #10256
[None][ci] Waive TestLlama3_1_8B::test_auto_dtype[False-2] for timeout by @syuoni in #10293
[None][chore] Remove NIM TRT-Backend Test Lists by @jieli-matrix in #10232
[None][fix] Fix pageable H2D memcopy issue on GB200 by @qiaoxj07 in #10289
[None][infra] Waive failed tests for main on 12/25 by @EmmaQiaoCh in #10298
[None] [doc] Update IFB performance guide & GPTOSS deployment guide by @jgangani in #10283
[TRTLLM-9578][feat] make PDL enabled by default by @dc3671 in #9695
[None][infra] Move install_boost from install_triton.sh to install_base.sh by @Tabrizian in #10055
[https://nvbugs/5752516][chore] unwaive test; fix port conflicts in CI by @reasonsolo in #10152
[TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations by @sherry-1001 in #10201
[https://nvbugs/5652062][fix] Rewind kv_cache and reset draft tokens by @ziyixiong-nv in #10160
[None][fix] Cherry-pick conflict changes for PR 7999 PR 8515 by @liji-nv in #9446
[None][feat] Support VLM part for Mistral Large 3 by @byshiue in #10188
[None][chore] Small refactoring to auto-deploy MoE operator by @nzmora-nvidia in #10300
[None][fix] Allow YAML config overwriting CLI args for trtllm-eval by @syuoni in #10296
[None][feat] Support multi-gpu running for nemotron-v3-nano and super by @Wanli-Jiang in #10118
[None] [doc] Document perfect MoE router feature for perf analysis by @jgangani in #10303
[https://nvbugs/5745152][fix] Fix some GPTOSS test setups by @dongfengy in #10085
[https://nvbugs/5633700][fix] Cache tiktoken vocab for gpt-oss by @LinPoly in #10219
[https://nvbugs/5747938][fix] Use local tokenizer by @LinPoly in #10230
[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI by @chenfeiz0326 in #9138
[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile by @liji-nv in #9740
[None][fix] Fix request_id for best_of/n case by @evezhier in #8368
[TRTLLM-10126][feat] Increase topk upper limit to 22 for NVLinkOneSid… by @nv-guomingz in #10229
[TRTLLM-8577][feat] Clean the Qwen3-next code by removing Qwen3NextCo… by @nv-guomingz in #10228
[None][fix] [Gemma3] Fix RoPE for local attention for Gemma3 by @shivghai in #9961
[https://nvbugs/5753250][fix] Fix undefined local variable in responses utils by @JunyiXu-nv in #10154
[TRTLLM-9962][feat] Some optimizations for two-model spec dec by @ziyixiong-nv in #10208
[None][ci] Move remaining DGX-B200 tests to LBD by @chzblych in #9876
[None][ci] Waive an intermittent test hang case by @chzblych in #10324
[https://nvbugs/5625990][fix] Respect VSWA scheme when doing block store for reuse and load block for reuse in KV cache manager by @eopXD in #10183
[None][infra] Some improvements for Slurm execution path in the CI by @chzblych in #10316
[#9626][feat] Add an auto-deploy transform for using cutlass FP4 MoE kernels by @nzmora-nvidia in #10304
[None][chore] Waive tests failing in pre-merge 12/28 by @brb-nv in #10311
[None][infra] Remove duplicates in waives.txt by @EmmaQiaoCh in #10333
[None][chore] Add failed cases into waives.txt by @xinhe-nv in #10321
[None][chore] Add failed cases into waives.txt by @xinhe-nv in #10301
[TRTLLM-9455][feat] support for new checkpoint by @binghanc in #10082
[TRTLLM-9965][test] add long-context disagg test for GB300/GB200 and remove config_index in yaml by @ruodil in #10225
[None][chore] Add failed cases into waives.txt by @xinhe-nv in #10302
[https://nvbugs/5594703][infra] Unwaive the failed case to test by @EmmaQiaoCh in #10275
[None][infra] Enable single-gpu CI on spark by @EmmaQiaoCh in #9304
[None][infra] Waive failed cases for main on 12/30 by @EmmaQiaoCh in #10338
[None][infra] Add LongBenchV1 to trtllm-eval. by @bobboli in #10265
[https://nvbugs/5766986][fix] fixed the shard_all_unprocessed default value to align with the default.yml by @MrGeva in #10271
[https://nvbugs/5769890][fix] Import get_free_port. by @yuxianq in #10341
[None][feat] Implement send_object for TorchDist. by @yuxianq in #10213
[https://nvbugs/5707359][fix] Unwaive OOM case that should be fixed by #9446 by @liji-nv in #10334
[None][chore] Add failed cases into waives.txt by @xinhe-nv in #10344
[None][fix] disable thread leak check for kimi by @xinhe-nv in #10337
[None][chore] Unify DS tool parser names by @LinPoly in #10239
[TRTLLM-10016][infra] Use SlurmPatition attribute time as timeout threshold by @yiqingy0 in #10254
[https://nvbugs/5774869][chore] waive tests. by @yuxianq in #10356
[https://nvbugs/5558516][test] add disaggregated stress test by @xinhe-nv in #9354
[None][feat] support Qwen3-VL dense model in pytorch backend by @Nekofish-L in #9060
[None][infra] Waive failed cases on 12/31 by @EmmaQiaoCh in #10353
[https://nvbugs/5727475][fix] Avoid use property with setter in nn.Mo… by @liji-nv in #10212
[#9717][chore] Standardize MoE weights interface by @tcherckez-nvidia in #10295
[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression by @chenfeiz0326 in #10282
[TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #10354
[https://nvbugs/5717993][fix] Add execution_stream across PyExecutor, KVCacheManager, PeftCacheManager to ensure proper CUDA stream synchronization between KV cache transfer operations and model forward kernels. by @SimengLiu-nv in #10060
[None][feat] Implement sampling for MTP 1-model by @mikeiovine in #10019
[#10244][feat] AutoDeploy: separate prefill/decode in flashinfer by @lucaslie in #10252
[https://nvbugs/5740359][chore] Unwaive tests. by @yuxianq in #10260
[https://nvbugs/5744427][fix] Make Gemma3 multimodal test fp8 by @brb-nv in #10368
[None][chore] Waive tests blocking pre-merge 12/31 by @brb-nv in #10373
[None][feat] Add export data to build and run script for AD by @tcherckez-nvidia in #10299
[#10056][fix] AutoDeploy: Handle deletion of nested params in sharding by @galagam in #10376
[TRTLLM-9467][fix] Fix PP+CP combination with helix parallelism by @brb-nv in #10312
[None][test] Unified slurm extra args management and session collection logic by @fredricz-20070104 in #10332
[None][fix] Minor updates on Perf Test System by @chenfeiz0326 in #10375
[#10056][chore] AutoDeploy: Enable Nemo SuperV3 accuracy test by @galagam in #10308
[None][feat] Eagle: MLA Based Eagle by @IzzyPutterman in #9677
[None][doc] promote AutoDeploy to beta feature in docs by @lucaslie in #10372
[TRTLLM-9752][fix] WAR: Disable PDL for quant kernels to fix accuracy issues by @bo-nv in #10285
[None][infra] Waive failed cases on 1/3 by @EmmaQiaoCh in #10391
[None][fix] [fix] Make NCCL resource manager destructor exception-safe by @nv-lschneider in #10166
[TRTLLM-10358][feat] Added proper rescaling of FP4 weights by @greg-kwasniewski1 in #10378
[None][infra] add retry logic to get slurm sbatch job log when ssh dropped by @yuanjingx87 in #9167
[https://nvbugs/5748683][fix] Use get_free_port_in_ci to avoid port conflict. by @yuxianq in #10392
[TRTLLM-10171][fix] Correct attention handling in ModelConfig and KVCacheManager by @jaedeok-nvidia in #10330
[TRTLLM-7138][feat] Support nvfp4 for gptoss by @dongfengy in #8956
[None][ci] Some tweaks for the CI pipeline by @chzblych in #10359
[None][fix] Decrease Pre Merge Perf Tests by @chenfeiz0326 in #10390

New Contributors

@salmanmkc made their first contribution in #10045
@sherry-1001 made their first contribution in #10042
@LarryXFly made their first contribution in #10119
@longcheng-nv made their first contribution in #10027
@lkomali made their first contribution in #9310
@RoeyAzran1992 made their first contribution in #10076
@jgangani made their first contribution in #10283
@shivghai made their first contribution in #9961

Full Changelog: v1.2.0rc6...v1.2.0rc7