NVIDIA/TensorRT-LLM v1.2.0rc6.post1 on GitHub

Security Vulnerabilities

GnuPG Vulnerability

A security vulnerability has been identified in GnuPG versions prior to 2.4.9, which is present in the Ubuntu 24.04 LTS utilized by the TensorRT LLM base image. For details regarding this vulnerability, please refer to the official Ubuntu advisory: CVE-2025-68973. An official patched package for the Ubuntu system is currently pending. The fix will be included in the next release once the updated package is published and incorporated. To mitigate potential risks immediately, users are advised to manually upgrade GnuPG to version 2.4.9 or later.

Hugging Face Transformers Vulnerabilities

Several security vulnerabilities have been disclosed regarding the Hugging Face Transformers library used in TensorRT LLM. As these issues originate from an upstream dependency, remediation is dependent on the release of a patch by the Hugging Face team. We are actively monitoring the situation and will update TensorRT LLM to include the necessary fixes once a stable release of the Transformers library addressing these vulnerabilities becomes available. Affected CVEs: CVE-2025-14920, CVE-2025-14921, CVE-2025-14924, CVE-2025-14927, CVE-2025-14928, CVE-2025-14929, CVE-2025-14930

What's Changed

[https://nvbugs/5708810][fix] Fix TRTLLMSampler by @moraxu in #9710
[TRTLLM-9641][infra] Use public triton 3.5.0 in SBSA by @ZhanruiSunCh in #9652
[TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #9979
[TRTLLM-9794][ci] move more test cases to gb200 by @QiJune in #9994
[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend by @ChristinaZ in #9792
[TRTLLM-8310][feat] Add Qwen3-VL-MoE by @yechank-nvidia in #9689
[https://nvbugs/5731717][fix] fixed flashinfer build race condition during test by @MrGeva in #9983
[FMDL-1222][feat] Support weight and weight_scale padding for NVFP4 MoE cutlass by @Wanli-Jiang in #9358
[None][chore] Update internal_cutlass_kernels artifacts by @yihwang-nv in #9992
[None][docs] Add README for Nemotron Nano v3 by @2ez4bz in #10017
[None][infra] Fixing credential loading in lockfile generation pipeline by @yuanjingx87 in #10020
[https://nvbugs/5727952][fix] a pdl bug in trtllm-gen fmha kernels by @PerkzZheng in #9913
[None][infra] Waive failed test for main branch on 12/16 by @EmmaQiaoCh in #10029
[None][doc] Update CONTRIBUTING.md by @syuoni in #10023
[None][fix] Fix Illegal Memory Access for CuteDSL Grouped GEMM by @syuoni in #10008
[TRTLLM-9181][feat] improve disagg-server prometheus metrics; synchronize workers' clocks when workers are dynamic by @reasonsolo in #9726
[None][chore] Final mass integration of release/1.1 by @mikeiovine in #9960
[None][fix] Fix iteration stats for spec-dec by @achartier in #9855
[https://nvbugs/5741060][fix] Fix pg op test by @shuyixiong in #9989
[https://nvbugs/5635153][chore] Remove responses tests from waive list by @JunyiXu-nv in #10026
[None] [feat] Enhancements to slurm scripts by @kaiyux in #10031
[None][infra] Waive failed tests due to llm model files by @EmmaQiaoCh in #10068
[None][fix] Enabled simultaneous support for low-precision combine and MTP. by @yilin-void in #9091
[https://nvbugs/5698434][test] Add Qwen3-4B-Eagle3 One-model perf test by @yufeiwu-nv in #10041
[TRTLLM-9998][fix] Change trtllm-gen MoE distributed tuning strategy back to INDEPENDENT by @hyukn in #10036
[TRTLLM-9989][fix] Disable tvm_ffi for CuteDSL nvFP4 dense GEMM. by @hyukn in #10040
[None][chore] Remove unnecessary warning log for tuning. by @hyukn in #10077
[TRTLLM-9680][perf] Optimize TRTLLMSampler log_probs performance (Core fix has been merged via #9353) by @tongyuantongyu in #9655
[None][chore] Bump version to 1.2.0rc6.post1 by @yiqingy0 in #10484

Full Changelog: v1.2.0rc6...v1.2.0rc6.post1