Announcement Highlights
-
Model Support
- Optimize the routing kernel for DeepSeek V3; add MoE TRTLLM backend support for KimiK2 and Qwen-next (#7761)
- Support DeepSeek V3.2 with FP8/BF16 KV cache and NVFP4/BF16 KV cache (#8405)
- Add EVS support for nano-v2-vlm (#8024)
- Support Qwen3 reasoning and tool parsers (#8000, #8216)
- Add Nemotron MOE support in AutoDeploy, including FP8 MOE (#8469, #8737, #8599)
-
API
-
Feature
- Add cuBLASLt NVFP4 GEMM backend (#7943)
- Add FP8 rowwise GEMMs for B200 (#8332)
- Enable low-precision alltoall for CUTLASS/TRTLLMGen (#8675)
- Integrate MNNVL Throughput and refactor allreduce kernel for TRTLLM MoE (#8728, #8018)
- Enable RMS norm fusion for Nemotron MOE (#8563)
- Add base64 video input support (#8458)
-
Fix & Infra
- Upgrade to DLFW 25.10, PyTorch 2.9.0, and Triton 3.5.0 (#8838)
- Fix FP8 blockwise GEMM performance with attention DP (#8501)
- Fix pipeline-parallel bubbles (#8687)
- Cache the AllReduce wrapper to avoid re-allocation hangs (#8803)
- Stabilize tests/CI with waives and slurm/CI updates (#8524, #8573, #8749, #8775, #8808, #8896, #8897)
-
Benchmark
-
Documentation
-
Known issue
- For this pre-release version, install using the specific version identifier:
pip3 install tensorrt-llm==1.2.0rc2. Installing withpip3 install tensorrt-llm --prewill result in a broken dependency ononnx==1.20.0rc1. This issue will be resolved in the next release.
- For this pre-release version, install using the specific version identifier:
What's Changed
- [None][chore] update test duration by @xinhe-nv in #8377
- [None][fix] Avoid overwrite of
kv_cache_config.max_tokensfor VSWA scheme for the KVCacheManager by @eopXD in #8219 - [TRTLLM-4866] [test] Support waiving unit tests by waives.txt by @VALLIS-NERIA in #8359
- [TRTLLM-8637][feat] Optimize the routing kernel for DeepseekV3 (MoE CUTLASS backend); Add support for 384 experts (MoE TRTLLM backend) by @ChristinaZ in #7761
- [https://nvbugs/5542862][fix] Upgrade fmha_v2. by @yuxianq in #8364
- [TRTLLM-8669][infra] Use artifactory mirror for install python by @ZhanruiSunCh in #8394
- [TRTLLM-7255][feat] Add iteration log parser script for benchmark log by @yizhang-nv in #6942
- [None][ci] move some test cases from H100 to A10 by @QiJune in #8449
- [TRTLLM-8436][feat] batched sampling and top-k logprobs improvements by @ixlmar in #8398
- [None][feat] Update devcontainer configuration to include additional extensions by @Funatiq in #8369
- [https://nvbugs/5540752][fix] Support quantized Phi4 MM models by @pamelap-nvidia in #8190
- [https://nvbugs/5492250][fix] Remove isolated cases and unwaive cases by @HuiGao-NV in #8492
- [TRTLLM-6055][infra] Slurm Test refactor by @yuanjingx87 in #7176
- [https://nvbugs/5568676][fix] Remove test waive by @dongfengy in #8437
- [#8461][feat] AutoDeploy: trtllm-serve bug fix + unit test by @lucaslie in #8462
- [None] [chore] Add architecture-specific ATTRIBUTIONS files by @venkywonka in #8468
- [#8272][feat] Enable chunked prefill for SSMs in AutoDeploy by @suyoggupta in #8477
- [None][feat] Update 3rdparty/DeepGEMM to latest commit by @ruoqianguo in #8488
- [None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model by @yechank-nvidia in #7789
- [TRTLLM-8436][fix] restore list[list[list[int]]] in add_token by @ixlmar in #8502
- [None][chore] Move submit.sh to python and use yaml configuration by @zerollzeng in #8003
- [TRTLLM-7287][test] add multimodal chunked_prefill cases by @ruodil in #8011
- [None][feat] Add alltoall to trtllm-gen MoE backend. by @bobboli in #8481
- [TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #8486
- [None][ci] rebalance H100 stages by @QiJune in #8491
- [None][feat] Support Qwen3 reasoning parser by @LinPoly in #8000
- [None][infra] Add split algorithm for slurm by @EmmaQiaoCh in #8516
- [TRTLLM-8638][fix] Remove closed bugs by @xinhe-nv in #8478
- [None][chore] Update feature combination matrix for SWA kv cache reuse by @eopXD in #8529
- [None][fix] the api_stability unify default values of None and inspect._empty by @Superjomn in #8496
- [None][infra] Waive failed tests for main 10/21 by @EmmaQiaoCh in #8524
- [None][doc] Facilitates the integration of the transfer agent by @Shixiaowei02 in #7867
- [TRTLLM-8160][feat] Add max_total_draft_tokens by @yweng0828 in #8366
- [None][chore] AutoDeploy: replace HF's deprecated keyword torch_dtype --> dtype by @lucaslie in #8510
- [TRTLLM-7843][feat] implement disagg cluster auto-scaling by @reasonsolo in #8215
- [None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy by @nvchenghaoz in #8469
- [TRTLLM-8483][chore] Refine scheduler_config and peft_cache_config in create_py_executor by @leslie-fang25 in #8451
- [https://nvbugs/5556020][fix] test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3 dimension mismatch by @sunnyqgg in #8517
- [None][doc] Fix the incorrect doc figure by @Shixiaowei02 in #8536
- [TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 by @chenfeiz0326 in #7985
- [None][infra] Let CI continue running other isolation tests when an isolation test get hanging by @EmmaQiaoCh in #8471
- [TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #8554
- [None][feat] Add vLLM KV Pool support for XQA mla kernel by @qsang-nv in #8560
- [https://nvbugs/5451272][fix] unwaive the test by @Shixiaowei02 in #8537
- [None][chore] Bump version to 1.2.0rc2 by @yiqingy0 in #8562
- [None][doc] Paragraph adjustment and fix statistic by @yunruis in #8568
- [None][infra] Waive failed cases for main branch 10/22 by @EmmaQiaoCh in #8573
- [TRTLLM-8785][fix] fix conflicts between periodic-junit and store-durations by @crazydemo in #8518
- [https://nvbugs/5594753][fix] fix rpc unique addr related issue by @Superjomn in #8419
- [#8391][fix] check perf by device subtype by @MrGeva in #8428
- [None][chore] replace print_colored_debug with logger_debug by @Superjomn in #8417
- [None][fix] generate nanobind stubs for submodules by @ixlmar in #8539
- [None][fix] fixed cached model path in test by @MrGeva in #8549
- [None][chore] add precommit hook to remove redundant tab and white space by @xinhe-nv in #8534
- [https://nvbugs/5429636][feat] Kv transfer timeout by @pcastonguay in #8459
- [None][fix] Fix EPLB CPU thread NUMA binding by @dongxuy04 in #8579
- [None][chore] Skip failing import of mxfp4_moe by @brb-nv in #8591
- [TRTLLM-8754][chore] Refine PyTorchModelEngine with llm args by @leslie-fang25 in #8493
- [TRTLLM-8682][chore] Remove auto_parallel module by @anish-shanbhag in #8329
- [None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN by @rosenrodt in #8156
- [TRTLLM-7954][feat] Target model KV cache rellocation by @sunnyqgg in #8421
- [https://nvbugs/5604136][fix] AutoDeploy: correct import for mxfp4_moe unit test by @lucaslie in #8593
- [None][fix] Allow multi-threaded copy for GDRCopy wrapper by @dongxuy04 in #8535
- [None][feat] Enable rms norm fusion for Nemotron MOE by @suyoggupta in #8563
- [None][feat] add Nemotron-Ultra multi nodes eval tests by @xinhe-nv in #8577
- [https://nvbugs/5451205][feat] Add cuBLASLt NVFP4 GEMM backend support by @Wong4j in #7943
- [TRTLLM-8714][fix] update create_input_processor to handle custom checkpoint format by @Funatiq in #7811
- [None][infra] Fix slurm exitcode by @EmmaQiaoCh in #8585
- [None][infra] Disable rtxpro6000 stages due to nodes will be offline by @EmmaQiaoCh in #8613
- [TRTLLM-8812][chore] Limit the scope of pybind based CacheTransceiverConfig by @QiJune in #8558
- [None][chore] Cleanup GDS code by @achartier in #8475
- [#4585][feat] Replace unified attention before export by @h-guo18 in #8303
- [TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly by @QiJune in #8561
- [None][doc] add visualization of perf metrics in time breakdown tool doc by @zhengd-nv in #8530
- [None][feat] Support base64 video input by @Wanli-Jiang in #8458
- [TRTLLM-8638][fix] Add flaky failed cases into waives.txt by @xinhe-nv in #8588
- [TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #8590
- [None][infra] enable lfs for generateLockFile pipeline by @yuanjingx87 in #8547
- [TRTLLM-8738][test] Add end-to-end trtllm-serve negative tests by @StanleySun639 in #8580
- [None][test] remove redunctant runtime backend in perf test by @ruodil in #8358
- [None][chore] Add failed cases into waives.txt by @xinhe-nv in #8630
- [TRTLLM-8638][fix] fix test issues by @xinhe-nv in #8557
- [None][infra] Waive tests on main and remove lines which missed in MI by @EmmaQiaoCh in #8639
- [None][chore] Disable GB300 stages due to nodes will be offline temporarily by @yiqingy0 in #8643
- [None][feat] add skip condition in AutoDeploy's triton fused moe kernel by @suyoggupta in #8632
- [TRTLLM-7078][chore] optimal kvcache transfer for VWSA by @chuangz0 in #7952
- [None][feat] Pass KvCacheRetentionConfig to torch LlmRequest by @achartier in #8634
- [TRTLLM-8737][feat] Support media_io_kwargs on trtllm-serve by @yechank-nvidia in #8528
- [TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache by @chang-l in #8405
- [None][feat] Support KV Connector with Disagg Prefill Worker by @jthomson04 in #8246
- [TRTLLM-8513][feat] Add back worker extension by @hchings in #8482
- [https://nvbugs/5494718][fix] Fix Single GPU Multi-node issue and OOM on DGX Spark by @SimengLiu-nv in #8514
- [TRTLLM-8238][feat] Add EVS support for nano-v2-vlm by @Wanli-Jiang in #8024
- [None][feat] AutoDeploy: Add FP8 MOE for Nemotron by @nvchenghaoz in #8599
- [None][infra] Waive failed case on main 10/26 by @EmmaQiaoCh in #8668
- [None][fix] Fix the performance issue of FP8 blockwise grouped GEMM when using attention DP by @jinyangyuan-nvidia in #8501
- [None][ci] move some time-consuming benchmark test cases to post merge by @QiJune in #8641
- [None][fix] Fix ModelConfig.from_pretrained get quant config file by @yuantailing in #8647
- [TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #8669
- [None][docs] Update Python wheel's short-/long-descriptions by @chzblych in #8676
- [TRTLLM-8638][fix] Add failed cases into waives.txt by @xinhe-nv in #8672
- [TRTLLM-7159][docs] Add documentation for additional outputs by @Funatiq in #8325
- [https://nvbugs/5546507][fix] skip test_mistral_nemo_eagle_1gpu test cases due to CMake Error in building by @jieli-matrix in #8677
- [None][feat] Add opentelemetry tracing by @zhanghaotong in #5897
- [None][test] Add longbench v2 for long context evaluation by @baize97 in #8604
- [None] [test] Add MNNVL AlltoAll tests to pre-merge by @kaiyux in #8601
- [TRTLLM-8933][chore] remove unused update_executor_config function by @QiJune in #8678
- [TRTLLM-8832][feat] fully async _select_generated_logits with tests by @ixlmar in #8628
- [None][feat] Autodeploy: Update the ssm to use slice by @nvchenghaoz in #8667
- [None][feat] Support ignored prompt length for penalties via new sampling config parameter by @nvxuanyuc in #8127
- [TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. by @bobboli in #7499
- [None][feat] Add FP8 rowwise GEMMs for B200 by @achartier in #8332
- [None][infra] Skip failed tests for main 10/27 by @EmmaQiaoCh in #8686
- [https://nvbugs/5608723][fix] Use local data on multimodal tests and unwaive tests by @yechank-nvidia in #8673
- [#8245][feat] Autodeploy: Guided Decoding Support by @govind-ramnarayan in #8551
- [None][infra] Minor Update on Perf Sanity Testdb Files by @chenfeiz0326 in #8607
- [None][chore] ISOLATE some cases by @HuiGao-NV in #8690
- [None][chore] Use a cached model path for Ray integration test by @achartier in #8660
- [https://nvbugs/5556475] [fix] Fix the
tensorrt_llm_blsmodel to correctly return the outputs fornum_input_tokensandnum_output_tokensby @pskiran1 in #8150 - [None][fix] Change Ray submit() to use async RPC by @hchings in #8636
- [None][test] Add gpt_oss_20b Model to Sanity Perf Test by @yufeiwu-nv in #8265
- [TRTLLM-7835][test] add default sample config for perf test by @ruodil in #8523
- [https://nvbugs/5564465][test] ensure deepseek_v3_lite isl + osl < max_seq_len by @ruodil in #8565
- [None][fix] fix EPLB init hang by @dongxuy04 in #8649
- [#8694][fix] fix AutoDeploy cuda memory access failure in NemotronHMoe by @MrGeva in #8696
- [None][chore] Revert "[TRTLLM-7835][test] add default sample config for perf test (#8523) by @Funatiq in #8725
- [https://nvbugs/5575913][fix] Use separate thresholds for 120b/20b gptoss by @dongfengy in #8664
- [None][fix] Properly raise error for nemotron H models by @2ez4bz in #8697
- [TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum by @anish-shanbhag in #8330
- [https://nvbugs/5606166][fix] AutoDeploy: use tuples for cudagraph shape lookup by @lucaslie in #8658
- [https://nvbugs/5596343][test] Update test waive to get back some coverage by @dongfengy in #8702
- [https://nvbugs/5549111][fix] Fix 2-model overlap scheduler accuracy on very long prompts by @mikeiovine in #8076
- [TRTLLM-8827] [feat] Enable low precision alltoall for Cutlass and TRTLLMGen backends by @kaiyux in #8675
- [TRTLLM-8431][doc] update public doc and example, add etcd auto-scaling tests by @reasonsolo in #8602
- [https://nvbugs/5596377][fix] Fix mm dummy calculation by @yechank-nvidia in #8498
- [https://nvbugs/5612529][fix] Fix transferAgent_test by @chuangz0 in #8710
- [None][test] fix a typo in perf test sampler config by @ruodil in #8726
- [None][feat] add detailed KV cache transfer time breakdown by @zhengd-nv in #8521
- [None][ci] waive test_rpc.py temporarily by @Superjomn in #8743
- [https://nvbugs/5613456][chore] Skip test_ptp_quickstart_advanced_multi_gpus[DeepSeek-V3-671B-FP8-DeepSeek-V3-0324-8] due to Model Creation OOM by @xinhe-nv in #8684
- [https://nvbugs/5549829][fix] Qwen2.5-VL TP > 1 + Quantized weight load fix by @yechank-nvidia in #8680
- [None][feat] Enable nvfp4 cuda core for sm120 by @Njuapp in #8620
- [None][perf] Use fp8 quant kernel in DS3.2 indexer module by @chang-l in #8701
- [https://nvbugs/5593199][test] Enhance beam search tests deterministic dummy model by @stnie in #8625
- [None][feat] add flag for EPLB to force using GDRCopy by @dongxuy04 in #8650
- [None][fix] add readme copy to wheel stage to avoid setup.py failure by @farazkh80 in #8736
- [None][infra] update ci allow list 2025/10/29 by @niukuo in #8749
- [TRTLLM-8214][feat] Support Qwen3 tool parser by @LinPoly in #8216
- [https://nvbugs/5607238][test] fix working dir in disagg worker test by @zhengd-nv in #8648
- [https://nvbugs/5550409][fix] Disable torch compile in piecewise attention part to Avoid host overhead by @yizhang-nv in #8708
- [None][chore] Enable GPQA in CI for DeepSeek V3.2 by @chang-l in #8712
- [https://nvbugs/5534574][fix] disable spec decoding forever once the request spec decoding is disabled by @kris1025 in #8446
- [None][fix] fix config loading for DeepSeek-V3.2 in trtllm-bench by @lfr-0531 in #8729
- [None][ci] waive test_rpc.py by @Superjomn in #8745
- [TRTLLM-8763][chore] Deprecate pybind based GuidedDecodingConfig usage in torch backend by @leslie-fang25 in #8717
- [None][infra] Waive failed tests on main 10/29 by @EmmaQiaoCh in #8759
- [TRTLLM-8976][feat] Move indexer-k-cache to KVCacheManager by @Tabrizian in #8699
- [None][infra] Check in most recent lock file from nightly pipeline by @yuanjingx87 in #8739
- [https://nvbugs/5599086][fix] Fix FP8 Linear module for spark by @SimengLiu-nv in #8707
- [None][doc] Minor doc update to disagg-serving by @schetlur-nv in #8768
- [https://nvbugs/5547414][fix] Use cached models by @HuiGao-NV in #8755
- [None][fix] Fix UnboundLocalError. by @yuxianq in #8756
- [TRTLLM-8971][infra] Update gpu key for B300/GB300 by @EmmaQiaoCh in #8724
- [None][doc] Add doc for torch.compile & piecewise cuda graph by @yizhang-nv in #8527
- [None][infra] Unwaive the tests passed in latest CI and disable a perf stage by @EmmaQiaoCh in #8775
- [None][infra] fix slurm results path by @yuanjingx87 in #8751
- [None][fix] fix runtime error that bf16 input is not quantized to nvfp4 when use bf16 dispatch by @yilin-void in #8507
- [https://nvbugs/5608461][fix] exclude InductorSubproc from thread leak check by @leslie-fang25 in #8704
- [https://nvbugs/5481206][fix] update waives by @xinhe-nv in #8774
- [None][feat] Refactor scaffolding streaming feature and fix openai wo… by @WeiHaocheng in #8622
- [None][feat] Add unit tests and revision in block_level kernel for invalid input by @ChristinaZ in #8718
- [TRTLLM-5453][infra] Check all steps for test name and also check the test in waives.txt also exists in l0 or qa test list. by @EmmaQiaoCh in #6256
- [None][feat] Autotuner can iterate through all tactics for test purposes by @rosenrodt in #8663
- [None][feat] Add layer wise benchmarks by @yuantailing in #8777
- [TRTLLM-8930][infra] Force Blossom perf test stages to use 'tensorrt/test_type: perf' in the K8S template by @ZhanruiSunCh in #8752
- [None][infra] Waive failed case for main branch by @EmmaQiaoCh in #8797
- [None][fix] Layer-wise benchmarks: use local models, lint by @yuantailing in #8799
- [TRTLLM-8734][feat] AutoDeploy: Enable the nvfp4 for Nemotron MOE by @nvchenghaoz in #8737
- [TRTLLM-8978][test] Remove llama 4 spec dec tests by @mikeiovine in #8766
- [None][infra] Update allow list 20251030 by @yuanjingx87 in #8808
- [https://nvbugs/5575687][fix] fix moe_gemm's preexit position that cause illegal memory access by @dc3671 in #8786
- [None][feat] Add disagg relay time to time breakdown tool by @nv-yilinf in #8465
- [https://nvbugs/5599515][fix] Fix PP bubbles. by @yuxianq in #8687
- [TRTLLM-9003][infra] Add python OpenSearchDB query / push. by @ZhanruiSunCh in #8506
- [https://nvbugs/5623960][fix] Compress the warning log of AutoTuner when encountering tactic failures. by @hyukn in #8793
- [None][chore] use cached vila model by @HuiGao-NV in #8788
- [None][info] Waive failed case for main by @EmmaQiaoCh in #8826
- [https://nvbugs/5617275][fix] Extract py files from prebuilt wheel for editable installs by @chang-l in #8738
- [None][fix] Waive layer-wise benchmark tests by @yuantailing in #8823
- [https://nvbugs/5558117][fix] Allow per-layer quant config from hf_quant_config.json by @rosenrodt in #8617
- [None][perf] AutoDeploy optimize _get_unique_value by @suyoggupta in #8822
- [https://nvbugs/5614506][chore] Adding e+p+d e2e test by @pcastonguay in #8801
- [None][doc] Clarify the perf best practice and supported hardware for gptoss by @dongfengy in #8665
- [https://nvbugs/5474119][fix] Re-enable test by @dongfengy in #8809
- [TRTINFRA-7215][infra] Add support for enroot SLURM clusters by @mlefeb01 in #8770
- [TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache by @lfr-0531 in #8692
- [TRTLLM-7008][fix] Enable GDRCopy and unwaive online eplb tests by @dongxuy04 in #8720
- [TRTLLM-7731][feat] Avoid over-allocation of KV cache for transmission in disagg with CP by @brb-nv in #8145
- [TRTLLM-8836][chore] Create ModelEngine from LlmArgs by @QiJune in #8600
- [None][fix] Rename: slot_count -> invalid_expert_id by @bobboli in #8783
- [https://nvbugs/5437384][test] cherry-pick fix trtllm llmapi launch multi tests by @Superjomn in #8567
- [None][feat] Use ruff for formatting and linting new files by default by @Funatiq in #8629
- [None][fix] WAR for tensorrt depending on the archived nvidia-cuda-runtime-cu13 package by @chzblych in #8857
- [#8781][fix] Cache the AllReduce wrapper to avoid re-allocating workspace which caused a hang by @MrGeva in #8803
- [None][test] Enhance GPT-OSS CI with GPQA Diamond and additional Spec Decoding Test by @dongfengy in #8661
- [None][feat] Fix attention sink load in xqa by @qsang-nv in #8836
- [https://nvbugs/5625380][chore] Remove multimodal related fields from decoder llm input by @chang-l in #8846
- [TRTLLM-8991][test] Add Llama 3.3 70B model with different performance config by @yufeiwu-nv in #8753
- [None][update] optimized sparse mla kernels && fix unspecified cuda launch by @PerkzZheng in #8866
- [None][feat] Add benchmark to DeepConf by @dcaox in #8776
- [TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database by @chenfeiz0326 in #8653
- [https://nvbugs/5523315][fix] Fix serve benchmark test by @yechank-nvidia in #8255
- [https://nvbugs/5625962][chore] unwaive DS-v32-fp4 tests by @lfr-0531 in #8853
- [None][chore] Add failed cases into waives.txt by @xinhe-nv in #8872
- [None][doc] Add LLM-API API change principle by @Superjomn in #8350
- [TRTLLM-9073/9087][doc] Add the missing content for model support section and fix valid links for long_sequence.md by @nv-guomingz in #8869
- [https://nvbugs/5521799][fix] add harmony channel validation by @xinhe-nv in #8837
- [None][fix] Fix import issues in layer-wise benchmarks by @yuantailing in #8827
- [None][infra] Waive the failed test for main on 11/3 by @EmmaQiaoCh in #8875
- [TRTLLM-8435][infra] Test existing rtxpro6000 stages on rtxpro6000d by @EmmaQiaoCh in #8319
- [TRTLLM-6928][fix] Refactor multimodal unittest by @yechank-nvidia in #8453
- [None][fix] Fix cute dsl nvfp4 gemm autotune issue by @limin2021 in #8761
- [None][chore] Add sample yaml for wide-ep example and minor fixes by @kaiyux in #8825
- [TRTINFRA-7215][infra] - Move half of the DGX H100 premerge tests to SLURM by @mlefeb01 in #8849
- [TRTLLM-8979][test] Improve qwen3 spec dec test coverage by @mikeiovine in #8767
- [TRTLLM-5966][feat] Helix: add full MLA support for Helix by @MatthiasKohl in #8104
- [TRTLLM-8680][doc] Add table with one-line deployment commands to docs by @anish-shanbhag in #8173
- [None][chore] Optimize perf for the RPC executor and add some profile utilities to llm-api by @Superjomn in #8415
- [TRTLLM-8690][feat] add more tensors to share buffers by @HuiGao-NV in #8691
- [None][infra] waive failed test on main 11/4 by @ZhanruiSunCh in #8896
- [None][infra] Waive failed tests for main branch by @EmmaQiaoCh in #8897
- [None][fix] InputProcessor config naming convention fix by @yechank-nvidia in #8705
- [https://nvbugs/5625990][chore] Add test coverage for current incapability of the KV cache manager by @eopXD in #8829
- [None][chore] Weekly mass integration of release/1.1 by @mikeiovine in #8508
- [TRTLLM-1234][feat] Add fp8 blockscaled Gemm for sm120 by @CarstyYou in #8844
- [TRTLLM-8994][infra] upgrade to DLFW 25.10 and pytorch 2.9.0 / triton 3.5.0 by @ZhanruiSunCh in #8838
- [https://nvbugs/5596343] [test] Waive flaky GPT-OSS cases by @VALLIS-NERIA in #8904
- [None][fix] Fix bug of undefined py_topk_logprobs_vals by @dcaox in #8789
- [None][ci] Remove outdated test entries by @Funatiq in #8909
- [None][feat] Integrate MnnvlThroughput into TRTLLM MoE. by @bobboli in #8728
- [None][fix] Remove duplicated test waives by @chzblych in #8914
- [TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration by @shuyixiong in #8302
- [#8389][fix] Update group attention matching to first map to custom torch attention by @Fridah-nv in #8638
- [https://nvbugs/5587574][fix] Increase server timeout to wait for weight loading by @pcastonguay in #8806
- [None][chore] Design diagram review process change by @yibinl-nvidia in #8748
- [None][ci] Add test on waives by @yechank-nvidia in #8915
- [None][feat] MNNVLAllreduce Kernel Refactor by @timlee0212 in #8018
- [None][chore] Add failed cases into waives.txt by @xinhe-nv in #8865
- [None][infra] Waive failed cases on main 11/05 by @EmmaQiaoCh in #8936
New Contributors
- @Wong4j made their first contribution in #7943
- @baize97 made their first contribution in #8604
- @nvxuanyuc made their first contribution in #8127
- @govind-ramnarayan made their first contribution in #8551
- @pskiran1 made their first contribution in #8150
- @mlefeb01 made their first contribution in #8770
Full Changelog: v1.2.0rc1...v1.2.0rc2