NVIDIA/TensorRT-LLM v1.2.0rc2 on GitHub

Announcement Highlights

Model Support
- Optimize the routing kernel for DeepSeek V3; add MoE TRTLLM backend support for KimiK2 and Qwen-next (#7761)
- Support DeepSeek V3.2 with FP8/BF16 KV cache and NVFP4/BF16 KV cache (#8405)
- Add EVS support for nano-v2-vlm (#8024)
- Support Qwen3 reasoning and tool parsers (#8000, #8216)
- Add Nemotron MOE support in AutoDeploy, including FP8 MOE (#8469, #8737, #8599)
API
- Support ignored prompt length via new sampling parameter (#8127)
- Replace unified attention before export (#8303)
- Add max_total_draft_tokens (#8366)
- Pass KvCacheRetentionConfig to torch LlmRequest (#8634)
Feature
- Add cuBLASLt NVFP4 GEMM backend (#7943)
- Add FP8 rowwise GEMMs for B200 (#8332)
- Enable low-precision alltoall for CUTLASS/TRTLLMGen (#8675)
- Integrate MNNVL Throughput and refactor allreduce kernel for TRTLLM MoE (#8728, #8018)
- Enable RMS norm fusion for Nemotron MOE (#8563)
- Add base64 video input support (#8458)
Fix & Infra
- Upgrade to DLFW 25.10, PyTorch 2.9.0, and Triton 3.5.0 (#8838)
- Fix FP8 blockwise GEMM performance with attention DP (#8501)
- Fix pipeline-parallel bubbles (#8687)
- Cache the AllReduce wrapper to avoid re-allocation hangs (#8803)
- Stabilize tests/CI with waives and slurm/CI updates (#8524, #8573, #8749, #8775, #8808, #8896, #8897)
Benchmark
- Add Server-Client Perf Test in pytest for B200/B300 (#7985)
- Add layer-wise benchmarks and detailed KV cache transfer time breakdown (#8777, #8521)
- Add longbench v2 for long-context evaluation (#8604)
- Add benchmark to DeepConf and upload perf results to database (#8776, #8653)
Documentation
- Add LLM-API change principles (#8350)
- Add docs for torch.compile and piecewise CUDA graph (#8527)
- Update public docs and add etcd auto-scaling tests (#8602)
- Clarify perf best practices and supported hardware for GPT-OSS (#8665)
Known issue
- For this pre-release version, install using the specific version identifier: pip3 install tensorrt-llm==1.2.0rc2. Installing with pip3 install tensorrt-llm --pre will result in a broken dependency on onnx==1.20.0rc1. This issue will be resolved in the next release.

What's Changed

[None][chore] update test duration by @xinhe-nv in #8377
[None][fix] Avoid overwrite of kv_cache_config.max_tokens for VSWA scheme for the KVCacheManager by @eopXD in #8219
[TRTLLM-4866] [test] Support waiving unit tests by waives.txt by @VALLIS-NERIA in #8359
[TRTLLM-8637][feat] Optimize the routing kernel for DeepseekV3 (MoE CUTLASS backend); Add support for 384 experts (MoE TRTLLM backend) by @ChristinaZ in #7761
[https://nvbugs/5542862][fix] Upgrade fmha_v2. by @yuxianq in #8364
[TRTLLM-8669][infra] Use artifactory mirror for install python by @ZhanruiSunCh in #8394
[TRTLLM-7255][feat] Add iteration log parser script for benchmark log by @yizhang-nv in #6942
[None][ci] move some test cases from H100 to A10 by @QiJune in #8449
[TRTLLM-8436][feat] batched sampling and top-k logprobs improvements by @ixlmar in #8398
[None][feat] Update devcontainer configuration to include additional extensions by @Funatiq in #8369
[https://nvbugs/5540752][fix] Support quantized Phi4 MM models by @pamelap-nvidia in #8190
[https://nvbugs/5492250][fix] Remove isolated cases and unwaive cases by @HuiGao-NV in #8492
[TRTLLM-6055][infra] Slurm Test refactor by @yuanjingx87 in #7176
[https://nvbugs/5568676][fix] Remove test waive by @dongfengy in #8437
[#8461][feat] AutoDeploy: trtllm-serve bug fix + unit test by @lucaslie in #8462
[None] [chore] Add architecture-specific ATTRIBUTIONS files by @venkywonka in #8468
[#8272][feat] Enable chunked prefill for SSMs in AutoDeploy by @suyoggupta in #8477
[None][feat] Update 3rdparty/DeepGEMM to latest commit by @ruoqianguo in #8488
[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model by @yechank-nvidia in #7789
[TRTLLM-8436][fix] restore list[list[list[int]]] in add_token by @ixlmar in #8502
[None][chore] Move submit.sh to python and use yaml configuration by @zerollzeng in #8003
[TRTLLM-7287][test] add multimodal chunked_prefill cases by @ruodil in #8011
[None][feat] Add alltoall to trtllm-gen MoE backend. by @bobboli in #8481
[TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #8486
[None][ci] rebalance H100 stages by @QiJune in #8491
[None][feat] Support Qwen3 reasoning parser by @LinPoly in #8000
[None][infra] Add split algorithm for slurm by @EmmaQiaoCh in #8516
[TRTLLM-8638][fix] Remove closed bugs by @xinhe-nv in #8478
[None][chore] Update feature combination matrix for SWA kv cache reuse by @eopXD in #8529
[None][fix] the api_stability unify default values of None and inspect._empty by @Superjomn in #8496
[None][infra] Waive failed tests for main 10/21 by @EmmaQiaoCh in #8524
[None][doc] Facilitates the integration of the transfer agent by @Shixiaowei02 in #7867
[TRTLLM-8160][feat] Add max_total_draft_tokens by @yweng0828 in #8366
[None][chore] AutoDeploy: replace HF's deprecated keyword torch_dtype --> dtype by @lucaslie in #8510
[TRTLLM-7843][feat] implement disagg cluster auto-scaling by @reasonsolo in #8215
[None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy by @nvchenghaoz in #8469
[TRTLLM-8483][chore] Refine scheduler_config and peft_cache_config in create_py_executor by @leslie-fang25 in #8451
[https://nvbugs/5556020][fix] test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3 dimension mismatch by @sunnyqgg in #8517
[None][doc] Fix the incorrect doc figure by @Shixiaowei02 in #8536
[TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 by @chenfeiz0326 in #7985
[None][infra] Let CI continue running other isolation tests when an isolation test get hanging by @EmmaQiaoCh in #8471
[TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #8554
[None][feat] Add vLLM KV Pool support for XQA mla kernel by @qsang-nv in #8560
[https://nvbugs/5451272][fix] unwaive the test by @Shixiaowei02 in #8537
[None][chore] Bump version to 1.2.0rc2 by @yiqingy0 in #8562
[None][doc] Paragraph adjustment and fix statistic by @yunruis in #8568
[None][infra] Waive failed cases for main branch 10/22 by @EmmaQiaoCh in #8573
[TRTLLM-8785][fix] fix conflicts between periodic-junit and store-durations by @crazydemo in #8518
[https://nvbugs/5594753][fix] fix rpc unique addr related issue by @Superjomn in #8419
[#8391][fix] check perf by device subtype by @MrGeva in #8428
[None][chore] replace print_colored_debug with logger_debug by @Superjomn in #8417
[None][fix] generate nanobind stubs for submodules by @ixlmar in #8539
[None][fix] fixed cached model path in test by @MrGeva in #8549
[None][chore] add precommit hook to remove redundant tab and white space by @xinhe-nv in #8534
[https://nvbugs/5429636][feat] Kv transfer timeout by @pcastonguay in #8459
[None][fix] Fix EPLB CPU thread NUMA binding by @dongxuy04 in #8579
[None][chore] Skip failing import of mxfp4_moe by @brb-nv in #8591
[TRTLLM-8754][chore] Refine PyTorchModelEngine with llm args by @leslie-fang25 in #8493
[TRTLLM-8682][chore] Remove auto_parallel module by @anish-shanbhag in #8329
[None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN by @rosenrodt in #8156
[TRTLLM-7954][feat] Target model KV cache rellocation by @sunnyqgg in #8421
[https://nvbugs/5604136][fix] AutoDeploy: correct import for mxfp4_moe unit test by @lucaslie in #8593
[None][fix] Allow multi-threaded copy for GDRCopy wrapper by @dongxuy04 in #8535
[None][feat] Enable rms norm fusion for Nemotron MOE by @suyoggupta in #8563
[None][feat] add Nemotron-Ultra multi nodes eval tests by @xinhe-nv in #8577
[https://nvbugs/5451205][feat] Add cuBLASLt NVFP4 GEMM backend support by @Wong4j in #7943
[TRTLLM-8714][fix] update create_input_processor to handle custom checkpoint format by @Funatiq in #7811
[None][infra] Fix slurm exitcode by @EmmaQiaoCh in #8585
[None][infra] Disable rtxpro6000 stages due to nodes will be offline by @EmmaQiaoCh in #8613
[TRTLLM-8812][chore] Limit the scope of pybind based CacheTransceiverConfig by @QiJune in #8558
[None][chore] Cleanup GDS code by @achartier in #8475
[#4585][feat] Replace unified attention before export by @h-guo18 in #8303
[TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly by @QiJune in #8561
[None][doc] add visualization of perf metrics in time breakdown tool doc by @zhengd-nv in #8530
[None][feat] Support base64 video input by @Wanli-Jiang in #8458
[TRTLLM-8638][fix] Add flaky failed cases into waives.txt by @xinhe-nv in #8588
[TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #8590
[None][infra] enable lfs for generateLockFile pipeline by @yuanjingx87 in #8547
[TRTLLM-8738][test] Add end-to-end trtllm-serve negative tests by @StanleySun639 in #8580
[None][test] remove redunctant runtime backend in perf test by @ruodil in #8358
[None][chore] Add failed cases into waives.txt by @xinhe-nv in #8630
[TRTLLM-8638][fix] fix test issues by @xinhe-nv in #8557
[None][infra] Waive tests on main and remove lines which missed in MI by @EmmaQiaoCh in #8639
[None][chore] Disable GB300 stages due to nodes will be offline temporarily by @yiqingy0 in #8643
[None][feat] add skip condition in AutoDeploy's triton fused moe kernel by @suyoggupta in #8632
[TRTLLM-7078][chore] optimal kvcache transfer for VWSA by @chuangz0 in #7952
[None][feat] Pass KvCacheRetentionConfig to torch LlmRequest by @achartier in #8634
[TRTLLM-8737][feat] Support media_io_kwargs on trtllm-serve by @yechank-nvidia in #8528
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache by @chang-l in #8405
[None][feat] Support KV Connector with Disagg Prefill Worker by @jthomson04 in #8246
[TRTLLM-8513][feat] Add back worker extension by @hchings in #8482
[https://nvbugs/5494718][fix] Fix Single GPU Multi-node issue and OOM on DGX Spark by @SimengLiu-nv in #8514
[TRTLLM-8238][feat] Add EVS support for nano-v2-vlm by @Wanli-Jiang in #8024
[None][feat] AutoDeploy: Add FP8 MOE for Nemotron by @nvchenghaoz in #8599
[None][infra] Waive failed case on main 10/26 by @EmmaQiaoCh in #8668
[None][fix] Fix the performance issue of FP8 blockwise grouped GEMM when using attention DP by @jinyangyuan-nvidia in #8501
[None][ci] move some time-consuming benchmark test cases to post merge by @QiJune in #8641
[None][fix] Fix ModelConfig.from_pretrained get quant config file by @yuantailing in #8647
[TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #8669
[None][docs] Update Python wheel's short-/long-descriptions by @chzblych in #8676
[TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #8672
[TRTLLM-7159][docs] Add documentation for additional outputs by @Funatiq in #8325
[https://nvbugs/5546507][fix] skip test_mistral_nemo_eagle_1gpu test cases due to CMake Error in building by @jieli-matrix in #8677
[None][feat] Add opentelemetry tracing by @zhanghaotong in #5897
[None][test] Add longbench v2 for long context evaluation by @baize97 in #8604
[None] [test] Add MNNVL AlltoAll tests to pre-merge by @kaiyux in #8601
[TRTLLM-8933][chore] remove unused update_executor_config function by @QiJune in #8678
[TRTLLM-8832][feat] fully async _select_generated_logits with tests by @ixlmar in #8628
[None][feat] Autodeploy: Update the ssm to use slice by @nvchenghaoz in #8667
[None][feat] Support ignored prompt length for penalties via new sampling config parameter by @nvxuanyuc in #8127
[TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. by @bobboli in #7499
[None][feat] Add FP8 rowwise GEMMs for B200 by @achartier in #8332
[None][infra] Skip failed tests for main 10/27 by @EmmaQiaoCh in #8686
[https://nvbugs/5608723][fix] Use local data on multimodal tests and unwaive tests by @yechank-nvidia in #8673
[#8245][feat] Autodeploy: Guided Decoding Support by @govind-ramnarayan in #8551
[None][infra] Minor Update on Perf Sanity Testdb Files by @chenfeiz0326 in #8607
[None][chore] ISOLATE some cases by @HuiGao-NV in #8690
[None][chore] Use a cached model path for Ray integration test by @achartier in #8660
[https://nvbugs/5556475] [fix] Fix the tensorrt_llm_bls model to correctly return the outputs for num_input_tokens and num_output_tokens by @pskiran1 in #8150
[None][fix] Change Ray submit() to use async RPC by @hchings in #8636
[None][test] Add gpt_oss_20b Model to Sanity Perf Test by @yufeiwu-nv in #8265
[TRTLLM-7835][test] add default sample config for perf test by @ruodil in #8523
[https://nvbugs/5564465][test] ensure deepseek_v3_lite isl + osl < max_seq_len by @ruodil in #8565
[None][fix] fix EPLB init hang by @dongxuy04 in #8649
[#8694][fix] fix AutoDeploy cuda memory access failure in NemotronHMoe by @MrGeva in #8696
[None][chore] Revert "[TRTLLM-7835][test] add default sample config for perf test (#8523) by @Funatiq in #8725
[https://nvbugs/5575913][fix] Use separate thresholds for 120b/20b gptoss by @dongfengy in #8664
[None][fix] Properly raise error for nemotron H models by @2ez4bz in #8697
[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum by @anish-shanbhag in #8330
[https://nvbugs/5606166][fix] AutoDeploy: use tuples for cudagraph shape lookup by @lucaslie in #8658
[https://nvbugs/5596343][test] Update test waive to get back some coverage by @dongfengy in #8702
[https://nvbugs/5549111][fix] Fix 2-model overlap scheduler accuracy on very long prompts by @mikeiovine in #8076
[TRTLLM-8827] [feat] Enable low precision alltoall for Cutlass and TRTLLMGen backends by @kaiyux in #8675
[TRTLLM-8431][doc] update public doc and example, add etcd auto-scaling tests by @reasonsolo in #8602
[https://nvbugs/5596377][fix] Fix mm dummy calculation by @yechank-nvidia in #8498
[https://nvbugs/5612529][fix] Fix transferAgent_test by @chuangz0 in #8710
[None][test] fix a typo in perf test sampler config by @ruodil in #8726
[None][feat] add detailed KV cache transfer time breakdown by @zhengd-nv in #8521
[None][ci] waive test_rpc.py temporarily by @Superjomn in #8743
[https://nvbugs/5613456][chore] Skip test_ptp_quickstart_advanced_multi_gpus[DeepSeek-V3-671B-FP8-DeepSeek-V3-0324-8] due to Model Creation OOM by @xinhe-nv in #8684
[https://nvbugs/5549829][fix] Qwen2.5-VL TP > 1 + Quantized weight load fix by @yechank-nvidia in #8680
[None][feat] Enable nvfp4 cuda core for sm120 by @Njuapp in #8620
[None][perf] Use fp8 quant kernel in DS3.2 indexer module by @chang-l in #8701
[https://nvbugs/5593199][test] Enhance beam search tests deterministic dummy model by @stnie in #8625
[None][feat] add flag for EPLB to force using GDRCopy by @dongxuy04 in #8650
[None][fix] add readme copy to wheel stage to avoid setup.py failure by @farazkh80 in #8736
[None][infra] update ci allow list 2025/10/29 by @niukuo in #8749
[TRTLLM-8214][feat] Support Qwen3 tool parser by @LinPoly in #8216
[https://nvbugs/5607238][test] fix working dir in disagg worker test by @zhengd-nv in #8648
[https://nvbugs/5550409][fix] Disable torch compile in piecewise attention part to Avoid host overhead by @yizhang-nv in #8708
[None][chore] Enable GPQA in CI for DeepSeek V3.2 by @chang-l in #8712
[https://nvbugs/5534574][fix] disable spec decoding forever once the request spec decoding is disabled by @kris1025 in #8446
[None][fix] fix config loading for DeepSeek-V3.2 in trtllm-bench by @lfr-0531 in #8729
[None][ci] waive test_rpc.py by @Superjomn in #8745
[TRTLLM-8763][chore] Deprecate pybind based GuidedDecodingConfig usage in torch backend by @leslie-fang25 in #8717
[None][infra] Waive failed tests on main 10/29 by @EmmaQiaoCh in #8759
[TRTLLM-8976][feat] Move indexer-k-cache to KVCacheManager by @Tabrizian in #8699
[None][infra] Check in most recent lock file from nightly pipeline by @yuanjingx87 in #8739
[https://nvbugs/5599086][fix] Fix FP8 Linear module for spark by @SimengLiu-nv in #8707
[None][doc] Minor doc update to disagg-serving by @schetlur-nv in #8768
[https://nvbugs/5547414][fix] Use cached models by @HuiGao-NV in #8755
[None][fix] Fix UnboundLocalError. by @yuxianq in #8756
[TRTLLM-8971][infra] Update gpu key for B300/GB300 by @EmmaQiaoCh in #8724
[None][doc] Add doc for torch.compile & piecewise cuda graph by @yizhang-nv in #8527
[None][infra] Unwaive the tests passed in latest CI and disable a perf stage by @EmmaQiaoCh in #8775
[None][infra] fix slurm results path by @yuanjingx87 in #8751
[None][fix] fix runtime error that bf16 input is not quantized to nvfp4 when use bf16 dispatch by @yilin-void in #8507
[https://nvbugs/5608461][fix] exclude InductorSubproc from thread leak check by @leslie-fang25 in #8704
[https://nvbugs/5481206][fix] update waives by @xinhe-nv in #8774
[None][feat] Refactor scaffolding streaming feature and fix openai wo… by @WeiHaocheng in #8622
[None][feat] Add unit tests and revision in block_level kernel for invalid input by @ChristinaZ in #8718
[TRTLLM-5453][infra] Check all steps for test name and also check the test in waives.txt also exists in l0 or qa test list. by @EmmaQiaoCh in #6256
[None][feat] Autotuner can iterate through all tactics for test purposes by @rosenrodt in #8663
[None][feat] Add layer wise benchmarks by @yuantailing in #8777
[TRTLLM-8930][infra] Force Blossom perf test stages to use 'tensorrt/test_type: perf' in the K8S template by @ZhanruiSunCh in #8752
[None][infra] Waive failed case for main branch by @EmmaQiaoCh in #8797
[None][fix] Layer-wise benchmarks: use local models, lint by @yuantailing in #8799
[TRTLLM-8734][feat] AutoDeploy: Enable the nvfp4 for Nemotron MOE by @nvchenghaoz in #8737
[TRTLLM-8978][test] Remove llama 4 spec dec tests by @mikeiovine in #8766
[None][infra] Update allow list 20251030 by @yuanjingx87 in #8808
[https://nvbugs/5575687][fix] fix moe_gemm's preexit position that cause illegal memory access by @dc3671 in #8786
[None][feat] Add disagg relay time to time breakdown tool by @nv-yilinf in #8465
[https://nvbugs/5599515][fix] Fix PP bubbles. by @yuxianq in #8687
[TRTLLM-9003][infra] Add python OpenSearchDB query / push. by @ZhanruiSunCh in #8506
[https://nvbugs/5623960][fix] Compress the warning log of AutoTuner when encountering tactic failures. by @hyukn in #8793
[None][chore] use cached vila model by @HuiGao-NV in #8788
[None][info] Waive failed case for main by @EmmaQiaoCh in #8826
[https://nvbugs/5617275][fix] Extract py files from prebuilt wheel for editable installs by @chang-l in #8738
[None][fix] Waive layer-wise benchmark tests by @yuantailing in #8823
[https://nvbugs/5558117][fix] Allow per-layer quant config from hf_quant_config.json by @rosenrodt in #8617
[None][perf] AutoDeploy optimize _get_unique_value by @suyoggupta in #8822
[https://nvbugs/5614506][chore] Adding e+p+d e2e test by @pcastonguay in #8801
[None][doc] Clarify the perf best practice and supported hardware for gptoss by @dongfengy in #8665
[https://nvbugs/5474119][fix] Re-enable test by @dongfengy in #8809
[TRTINFRA-7215][infra] Add support for enroot SLURM clusters by @mlefeb01 in #8770
[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache by @lfr-0531 in #8692
[TRTLLM-7008][fix] Enable GDRCopy and unwaive online eplb tests by @dongxuy04 in #8720
[TRTLLM-7731][feat] Avoid over-allocation of KV cache for transmission in disagg with CP by @brb-nv in #8145
[TRTLLM-8836][chore] Create ModelEngine from LlmArgs by @QiJune in #8600
[None][fix] Rename: slot_count -> invalid_expert_id by @bobboli in #8783
[https://nvbugs/5437384][test] cherry-pick fix trtllm llmapi launch multi tests by @Superjomn in #8567
[None][feat] Use ruff for formatting and linting new files by default by @Funatiq in #8629
[None][fix] WAR for tensorrt depending on the archived nvidia-cuda-runtime-cu13 package by @chzblych in #8857
[#8781][fix] Cache the AllReduce wrapper to avoid re-allocating workspace which caused a hang by @MrGeva in #8803
[None][test] Enhance GPT-OSS CI with GPQA Diamond and additional Spec Decoding Test by @dongfengy in #8661
[None][feat] Fix attention sink load in xqa by @qsang-nv in #8836
[https://nvbugs/5625380][chore] Remove multimodal related fields from decoder llm input by @chang-l in #8846
[TRTLLM-8991][test] Add Llama 3.3 70B model with different performance config by @yufeiwu-nv in #8753
[None][update] optimized sparse mla kernels && fix unspecified cuda launch by @PerkzZheng in #8866
[None][feat] Add benchmark to DeepConf by @dcaox in #8776
[TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database by @chenfeiz0326 in #8653
[https://nvbugs/5523315][fix] Fix serve benchmark test by @yechank-nvidia in #8255
[https://nvbugs/5625962][chore] unwaive DS-v32-fp4 tests by @lfr-0531 in #8853
[None][chore] Add failed cases into waives.txt by @xinhe-nv in #8872
[None][doc] Add LLM-API API change principle by @Superjomn in #8350
[TRTLLM-9073/9087][doc] Add the missing content for model support section and fix valid links for long_sequence.md by @nv-guomingz in #8869
[https://nvbugs/5521799][fix] add harmony channel validation by @xinhe-nv in #8837
[None][fix] Fix import issues in layer-wise benchmarks by @yuantailing in #8827
[None][infra] Waive the failed test for main on 11/3 by @EmmaQiaoCh in #8875
[TRTLLM-8435][infra] Test existing rtxpro6000 stages on rtxpro6000d by @EmmaQiaoCh in #8319
[TRTLLM-6928][fix] Refactor multimodal unittest by @yechank-nvidia in #8453
[None][fix] Fix cute dsl nvfp4 gemm autotune issue by @limin2021 in #8761
[None][chore] Add sample yaml for wide-ep example and minor fixes by @kaiyux in #8825
[TRTINFRA-7215][infra] - Move half of the DGX H100 premerge tests to SLURM by @mlefeb01 in #8849
[TRTLLM-8979][test] Improve qwen3 spec dec test coverage by @mikeiovine in #8767
[TRTLLM-5966][feat] Helix: add full MLA support for Helix by @MatthiasKohl in #8104
[TRTLLM-8680][doc] Add table with one-line deployment commands to docs by @anish-shanbhag in #8173
[None][chore] Optimize perf for the RPC executor and add some profile utilities to llm-api by @Superjomn in #8415
[TRTLLM-8690][feat] add more tensors to share buffers by @HuiGao-NV in #8691
[None][infra] waive failed test on main 11/4 by @ZhanruiSunCh in #8896
[None][infra] Waive failed tests for main branch by @EmmaQiaoCh in #8897
[None][fix] InputProcessor config naming convention fix by @yechank-nvidia in #8705
[https://nvbugs/5625990][chore] Add test coverage for current incapability of the KV cache manager by @eopXD in #8829
[None][chore] Weekly mass integration of release/1.1 by @mikeiovine in #8508
[TRTLLM-1234][feat] Add fp8 blockscaled Gemm for sm120 by @CarstyYou in #8844
[TRTLLM-8994][infra] upgrade to DLFW 25.10 and pytorch 2.9.0 / triton 3.5.0 by @ZhanruiSunCh in #8838
[https://nvbugs/5596343] [test] Waive flaky GPT-OSS cases by @VALLIS-NERIA in #8904
[None][fix] Fix bug of undefined py_topk_logprobs_vals by @dcaox in #8789
[None][ci] Remove outdated test entries by @Funatiq in #8909
[None][feat] Integrate MnnvlThroughput into TRTLLM MoE. by @bobboli in #8728
[None][fix] Remove duplicated test waives by @chzblych in #8914
[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration by @shuyixiong in #8302
[#8389][fix] Update group attention matching to first map to custom torch attention by @Fridah-nv in #8638
[https://nvbugs/5587574][fix] Increase server timeout to wait for weight loading by @pcastonguay in #8806
[None][chore] Design diagram review process change by @yibinl-nvidia in #8748
[None][ci] Add test on waives by @yechank-nvidia in #8915
[None][feat] MNNVLAllreduce Kernel Refactor by @timlee0212 in #8018
[None][chore] Add failed cases into waives.txt by @xinhe-nv in #8865
[None][infra] Waive failed cases on main 11/05 by @EmmaQiaoCh in #8936

New Contributors

@Wong4j made their first contribution in #7943
@baize97 made their first contribution in #8604
@nvxuanyuc made their first contribution in #8127
@govind-ramnarayan made their first contribution in #8551
@pskiran1 made their first contribution in #8150
@mlefeb01 made their first contribution in #8770

Full Changelog: v1.2.0rc1...v1.2.0rc2