NVIDIA/TensorRT-LLM v1.3.0rc7 on GitHub

Highlights

Model Support
- Support tensor parallelism of TRTLLM MoE backend for Nemotron-H model (#11470)
- Add Kimi-K2.5 text model support (NVFP4) (#11777)
- Add Helix CP support for DSV3.2 (#11507)
- Support mix quantization between shared experts and routed experts for DSV3 (#11215)
- Support Cohere Command A model (#11505)
- Extract embeddings as .safetensors and support float8-quantized models (#11180)
API
- Add --served-model-name option to serve command (#11711)
- Add flag to trtllm serve to override KV cache dtype (#11487)
- Use string stop/bad words in gRPC proto instead of pre-tokenized TokenSequence (#11888)
- Support multimodal image input in gRPC server (#11800)
- Expose use_python_scheduler in SchedulerConfig and add associated tests (#11884)
- Add max_gpu_total_bytes to control KVCacheManagerV2 capacity (#11907)
Feature
- Support PARD (Parallel Draft Model) in one-model speculative decoding (#11438)
- Enable autotuner for VisualGen and compilation config support (#11660)
- Add globaltimer-based timing backend for autotuner profiling (#11657)
- Support heterogeneous tokens_per_block (#11751)
- Refactor KVCacheManagerV2 to simplify new model support (#11749)
- Support Helix CP with GQA (#11570)
- Add option to skip KV cache memory estimation (#11714)
- Implement suffix automaton on device for speculative decoding and one-model support (#11434)
- Separate radix search tree implementation (#10862)
- Add support for expert_number (\le 2048) and K (\le 32) (#11510)
- Add support for bidirectional sliding window attention mask to fmha_v2 (#11212)
- Avoid duplicated computation with ADP + Helix CP in GQA (#11891)
- Add explicit video encode format support (#11830)
- Refactor video encoding to use ffmpeg CLI or pure Python fallback (#11672)
- Integrate CuTe DSL top-k kernel for Blackwell (#11900)
- Integrate suffix automaton with EAGLE3 and PARD (#11878)
- Add 5D A2A for fused Ulysses (#11787)
- Add SiLU to trtllm-gen MoE (#11663)
- Optimize by fusing nvfp4_quant into layernorm_gated for mamba2_mixer (#11473)
- Wire KVCacheBlock to UnifiedBlockTree using lookup-node pointers (#11919)
- Run extra general warmup to warm up memory pool (#10340)
Fix
- Add async worker to MTP/EAGLE3 sampler (#11573)
- Fix disaggregated cancellation (#11730)
- Use prefer_pinned() in pard.py (#11762)
- Release KVCacheManagerV2 memory immediately on shutdown (#11746)
- Remove duplicated MoE computation with Helix CP+DP (#11167)
- Register add+norm fallback pass for torch.compile in multi-GPU mode (#11739)
- Propagate logprobs from prefill to decode in disaggregated serving (#11727)
- Propagate logits from prefill to decode in disaggregated serving (#11767)
- Enable separate draft KV cache pool for aggregated mode and KVBM (#11689)
- Fix warnings when building moe_kernels.cu (#11703)
- Fix available_blocks typo in scheduler (#11801)
- Clean up memory in rollout process (#11658)
- Warm up maybe_compiled_cat in forward_context_with_chunked_prefill (#11743)
- Fix DeepEPLowLatency with CuTe DSL MoE backend (#11769)
- Fix FP8 per-tensor torch.compile graph break in dynamic quantization (#11759)
- Fix streaming generation logits and speed up logits testcase (#10637)
- Fix overly aggressive capacity scheduler (#11731)
- Use proper tokens when exclude_input_in_output is true (#9453)
- Move launch_dependent_grids after tmem free to fix race (#11812)
- Fix E/PD disaggregated chunked prefill bug (#11805)
- Fix SM120 issue for rms_norm with nvfp4_quant_fusion (#11774)
- Remove dead code (#11813)
- Fix KVCacheManagerV2 OOM and dummy request allocation in chunked prefill / pipeline parallel (#11710)
- Fix AttributeError when DSA indexer accesses non-DSA KVCacheManager (#11858)
- Override mMaxAttentionWindow with actual largest window size (#11842)
- Update check_is_moe to support mlp_layer_types after config.json update (#11477)
- Fix incorrect GPU timing in time breakdown under overlap scheduler (#11860)
- Fix OOM hang with NCCL_SYMMETRIC fallback during long-context inference (#11870)
- Fix position IDs input for Qwen3.5 text-only usage (#11877)
- Disable preload for Llama4 Scout (#11873)
- Fix formatting issue in tensorrt_llm/serve/openai_server.py (#11920)
- Prevent RuntimeError from dict mutation during iteration in EXAONE MoE weight mapper (#11862)
- Fix Nemotron MTP crash on SM90 (#11807)
- Fix Mistral Large3 + EAGLE bug (#11942, #11885)
- Fix TeaCache broken caching for FLUX.1 and FLUX.2 (#11868)
- Fix FLUX.1 TeaCache polynomial coefficients and defaults (#12007)
- Implement workaround for ClientPayloadError (#12018)
- Fix duplicate model entry in model list (#12029)
- Fix Python string truthiness bug in FMHA cubin selection (#11909)
Documentation
- Fix typos, grammar, and accuracy across documentation (#11766)
- Add sparse attention tech blog (#11644)
- Add known issue for disaggregated serving hang with asymmetric PP/TP (#11789)
- Fix documentation links (#11912)
- Replace “TensorRT-LLM” with “TensorRT LLM” (#11914)
- Add CI trigger and test-failure retrieval instructions to AGENTS.md (#11803)
Benchmark
- Vectorize quantize_fp8_blockwise with CUDA kernel (#11724)
- Use F.rms_norm for per-head QK normalization in VisualGen (#11798)
- Short-sequence MHA optimization for DSA MLA prefill (#11677)
- Parallel VAE harness and implementation for WAN (#11875)
- Add Triton FP8 blockwise quant kernel and autotuner bucket-skip for VisualGen (#11854)
- Optimize _prepare_inputs host time (#11704)
- Improve are_stop_words performance (#11196)
- Add DeepSeek RCCA performance test case (#11736)
- Add VisualGen benchmarking script (#11651)
Test & Infra
- Add tests for all database configs (#11653)
- Move B200 test stage to AIHub (#11692)
- Support local wheel installation and add GB300 demo cases (#11742)
- Remove submodule pulls from TRT-LLM git checkouts (#11693)
- Add back WAN VBench test in CI (#11804)
- Add E2E test for cancelled disaggregated generation requests with overlap scheduler (#11795)
- Pass Nsight options to ray_executor and trigger profiling through collective_rpc (#11493)
- Add B200 multi-node tests DB (#11783)
- Add sanity tests for release 1.2 version (#11738)
- Add QA test case for trust-remote-code on multi-node failure (#11905)
- Fix model_name Starcoder 15B allowed-models issue (#11981)
- Upgrade xgrammar from 0.1.25 to 0.1.32 (#12016)
- Limit TileIRAS to CUDA 13.1 (#12042)
- Remove VisualGen benchmark test from YAML (#12027)

What's Changed

[None][feat] Support tensor parallelism for nemotron-h model by @Wanli-Jiang in #11470
[None][test] Add tests for all database configs. by @fsaady in #11653
[https://nvbugs/5911143][fix] add async worker to MTP/Eagle3 sampler,… by @dhansen-nvidia in #11573
[TRTLLM-10886][feat] Support PARD(Parallel Draft Model) in one-model spec dec by @ziyixiong-nv in #11438
[None][fix] Fix disagg cancellation by @Tabrizian in #11730
[None][fix] Use prefer_pinned() in pard.py by @mikeiovine in #11762
[None][fix] Make KVCacheManagerV2 release mem immediately on shutdown by @lowsfer in #11746
[TRTLLM-11115][feat] enable autotuner for visual gen + Compilation Config by @NVShreyas in #11660
[None][chore] Minor fix in w4a8 mxfp4 mxfp8 test. by @Tracin in #11745
[None][infra] Move B200 test stage to AIHub by @yuanjingx87 in #11692
[None][infra] Waive failed cases for main on 02/27 by @EmmaQiaoCh in #11770
[TRTLLM-11064][fix] Remove duplicated MoE Computation with Helix CP+DP by @brb-nv in #11167
[TRTLLM-10386][fix] torch.compile: register add+norm fallback pass in multi-GPU mode by @luyiyun1021 in #11739
[None][feat] Support heterogeneous tokens_per_block by @lowsfer in #11751
[None][chore] Remove closed bugs by @xinhe-nv in #11527
[None][test] local wheel installation support and add gb300 cases demo by @fredricz-20070104 in #11742
[None][feat] Refactor cache manager v2 to simplify new model support by @jiaganc in #11749
[https://nvbugs/5879614][fix] Waive test_guided_decoding_with_eagle3 xgrammar in disaggregated serving by @ziyixiong-nv in #11773
[https://nvbugs/5911788][test] Waive test_llm_partial_update_weights[Qwen3/Qwen3-8B] by @liji-nv in #11785
[None][feat] add globaltimer-based timing backend for autotuner profi… by @dhansen-nvidia in #11657
[https://nvbugs/5926823][fix] Propagate logprobs from prefill to decode in disagg by @brb-nv in #11727
[TRTLLMINF-9][chore] Remove submodule pulls from TRT-LLM git checkouts by @dpitman-nvda in #11693
[https://nvbugs/5685010][fix] Delete test_eagle3_output_repetition_4gpus flaky assertions. by @zheyuf in #11725
[None][fix] enable separate draft KV cache pool for aggregated + KVBM… by @zyang-Modular in #11689
[TRTLLM-11058][feat] Support Helix CP with GQA by @brb-nv in #11570
[None][perf] Vectorize quantize_fp8_blockwise with CUDA kernel by @karljang in #11724
[https://nvbugs/5868616][fix] Fix warnings when building moe_kernels.cu by @yumin066 in #11703
[None][chore] Add CI trigger and test failure retrieval instructions to AGENTS.md by @lucaslie in #11803
[None][fix] Fix typo: avaiable_blocks -> available_blocks in scheduler by @kaiyux in #11801
[TRTLLM-11568][feat] Fix collective calls by @greg-kwasniewski1 in #11632
[None][perf] Use F.rms_norm for per-head QK normalization in visual gen by @karljang in #11798
[TRTLLM-11185][test] Add back WAN VBench test in CI by @chang-l in #11804
[TRTLLM-9782][feat] Support to skip KV cache memory estimation by @HuiGao-NV in #11714
[None][doc] Fix typos, grammar, and accuracy across documentation by @kaiyux in #11766
[None][fix] cleanup mem in rollout process by @hchings in #11658
[None][feat] Add --served-model-name option to serve command by @slin1237 in #11711
[None][chore] Update AGENTS.md by @lucaslie in #11809
[None][fix] AutoDeploy: Fix shape handling for singleton prefill by @galagam in #11679
[None][infra] Waive failed cases for main on 03/01 by @EmmaQiaoCh in #11811
[None][feat] TRT-LLM Gen MoE finalize kernel optimization by @nekorobov in #11501
[None][test] Add E2E test for cancelled disagg gen request with overlap scheduler by @Tabrizian in #11795
[None][chore] pass nsight options to ray_executor and trigger profiling through collective_rpc by @davidmlw in #11493
[TRTLLM-10962][feat] Refactor video encoding to use ffmpeg CLI or pur… by @JunyiXu-nv in #11672
[https://nvbugs/5823212][fix] Warmup maybe_compiled_cat in forward_context_with_chunked_prefill by @yuantailing in #11743
[None][feat] Extract embeding as .savetensors and support float8 quantized model by @nvyocox in #11180
[https://nvbugs/5885070][fix] fix deepeplowlatency with cutedsl moe backend by @leslie-fang25 in #11769
[None][fix] Fix FP8 per-tensor torch.compile graph break in dynamic quantization by @karljang in #11759
[TRTLLM-9687][feat] Improve are_stop_words performance by @stnie in #11196
[https://nvbugs/5883738][fix] fix bug for illegal memory access on Qwen3-235B-A22B-Thinking-2507-NVFP4 + Eagle3 by @sunnyqgg in #11474
[#10693][chore] AutoDeploy: Add L1 tests from coverage dashboard by @marinayanov in #11530
[https://nvbugs/5764627][fix] Fix generation logits with streaming and improve runtime of logits testcase. Also fixes https://nvbugs/5573238 by @stnie in #10637
[https://nvbugs/5934461][fix] Propagate logits from prefill to decode in disagg by @brb-nv in #11767
[#11726][feat] AutoDeploy: Fuse gemms of mixed children by @taylor-yb-lee in #11793
[None][fix] Fix overly aggressive capacity scheduler by @jthomson04 in #11731
[https://nvbugs/5689262][fix] use proper tokens when exclude_input_in_output is true by @lazykyama in #9453
[https://nvbugs/5863912][fix] Fix with move launch_dependent_grids after tmem free by @benzh-2025 in #11812
[https://nvbugs/5938603][fix] Fix E/PD disagg chunked prefill bug by @2ez4bz in #11805
[None][test] add deepseek RCCA perf test case by @ruodil in #11736
[None][fix] remove torch compile models arg by @NVShreyas in #11836
[None][test] add b200 multi nodes tests db by @xinhe-nv in #11783
[None][fix] Fix SM120 issue for rms_norm with nvfp4_quant_fusion by @Wanli-Jiang in #11774
[None][infra] Waive failed cases for main for post-merge 2564 by @ZhanruiSunCh in #11848
[https://nvbugs/5936502][fix] remove dead codes by @bo-nv in #11813
[None][chore] a GitHub Action to assign the PR to the author by @zhenhuaw-me in #11673
[None][infra] Fix a typo in waives.txt by @EmmaQiaoCh in #11852
[None][test] Fix wrong lora config by @yufeiwu-nv in #11818
[None][test] fix flaky issues by @xinhe-nv in #11814
[None][fix] Fix OOM issue/dummy request allocation/chunked prefill/pp for KV Cache Manager V2 by @yizhang-nv in #11710
[None][test] update waive list by @xinhe-nv in #11815
[TRTLLM-9939][perf] Short-sequence MHA optimization for DSA MLA prefill by @kaiyux in #11677
[None][refactor] Revisit attention interface for AutoDeploy by @lucaslie in #11796
[None][feat] Add a flag in trtllm serve to support overriding kv cache dtype by @cjluo-nv in #11487
[TRTLLMINF-9][chore] Use checkoutFile in mergeWaiveList to avoid full clone by @dpitman-nvda in #11794
[None][chore] Refresh inferenceX configs in recipes by @venkywonka in #11595
[TRTLLM-11042][feat] Implement suffix automaton on device for spec and support one model by @cascade812 in #11434
[https://nvbugs/5941681][fix] Handle dict type for speculative_config by @ziyixiong-nv in #11828
[None][feat] Add Kimi-K2.5 text model support (NVFP4) by @lancelly in #11777
[None][chore] Bump version to 1.3.0rc7 by @yuanjingx87 in #11864
[https://nvbugs/5919026][fix] Fix AttributeError when DSA indexer accesses non-DSA kv_cache_manager by @ziyixiong-nv in #11858
[TRTLLM-11184][feat] Explicit video encode format support by @JunyiXu-nv in #11830
[None][test] Enable DeepGemm + DeepEPLowLatency MoE test combination by @Tabrizian in #11876
[#10009][fix] Fix json_schema response_format to support OpenAI API w… by @JunyiXu-nv in #11497
[https://nvbugs/5927620][fix] Override mMaxAttentionWindow with the actual largest window size by @ziyixiong-nv in #11842
[None][feat] Support mix quantization between shared experts and routed experts for dsv3 by @dmtri35 in #11215
[#11666][fix] Fix inmemory model dir detection by @capyun007 in #11753
[None][infra] Waive 3 failed cases for main in post-merge 2566 by @ZhanruiSunCh in #11881
[None][doc] Add sparse attention tech blog by @heyuhhh in #11644
[TRTLLM-9392][feat] Support MoE output to alltoall's workspace for all the quantization recipe of trtllm-gen. by @bobboli in #11449
[TRTLLM-10852][feat] Enhance logprobs functionality to always return prompt token logprobs in prompt logprobs by @stnie in #11235
[None][fix] Fix typos, grammar, and formatting in comments and docstrings by @kaiyux in #11826
[None][fix] Update check_is_moe into support mlp_layer_types after config.json update by @eagle705 in #11477
[https://nvbugs/5946303][fix] Fix incorrect GPU timing in time breakdown under overlap scheduler by @luyiyun1021 in #11860
[None][chore] Update autotuner by @jiahanc in #11859
[None][chore] Handle failure in auto-assign author workflow by @zhenhuaw-me in #11906
[https://nvbugs/5930934][fix] Fix OOM hang with NCCL_SYMMETRIC fallback during long-context inference by @peihu-nv in #11870
[None][fix] Qwen3.5 fix positions ids input for text-only usage by @bmarimuthu-nv in #11877
[None][fix] Refactor nanoV3+superV3 accuracy tests to load example config by @galagam in #11458
[None][chore] Deprecate eagle3 2-model by @mikeiovine in #11761
[#11819][fix] Disable preload for Llama4 scout by @taylor-yb-lee in #11873
[None][chore] Fix format issue in tensorrt_llm/serve/openai_server.py by @chienchunhung in #11920
[None][feat] Separate radix search tree implementation by @thorjohnsen in #10862
[None][feat] Add support for expert_number<=2048 and K<=32 by @ChristinaZ in #11510
[None][infra] Waive 1 failed cases for main in pre-merge 29212 by @ZhanruiSunCh in #11929
[None][fix] remove leak check for kimi by @xinhe-nv in #11825
[https://nvbugs/5907477][chore] unwaive test by @reasonsolo in #11896
[TRTLLM-10956][infra] Support build-only mode for GenPostMergeBuilds job by @mzweilz in #11895
[#11755][feat] AutoDeploy onboarding agent + Kimi K2.5 AD modeling code by @bmarimuthu-nv in #11780
[None][fix] Prevent RuntimeError from dict mutation during iteration in EXAONE MoE weight mapper by @Bias92 in #11862
[TRTLLM-11101][feat] VisualGen benchmarking script by @zhenhuaw-me in #11651
[https://nvbugs/5820734][fix] Run extra general warmup to warm up memory pool by @liji-nv in #10340
[None][fix] Fix nemotron super MTP crash on SM90 by @sunnyqgg in #11807
[None][chore] Use cluster service discover in disagg CI tests by @ekou24 in #11242
[None][feat] External Drafter One Model by @IzzyPutterman in #11758
[None][chore] Update model list by @tcherckez-nvidia in #11827
[#11578][fix] Use string stop/bad words in gRPC proto instead of pre-tokenized TokenSequence by @CatherineSue in #11888
[None][feat] Add support for bidirectional sliding window attention mask to fmha_v2 by @djns99 in #11212
[TRTLLM-11036][feat] Enable new moe test and clean the legacy moe test in the CI by @xxi-nv in #11817
[None][infra] Waive 4 failed cases for main in post-merge 2571 by @ZhanruiSunCh in #11968
[None][test] Fix deepseek-r1 OOM issue for H100 perf test by @yufeiwu-nv in #11948
[None][fix] Remove incorrect Python import style rule from AGENTS.md by @yuxianq in #11940
[https://nvbugs/5896577][fix] fix bug of mistral large3 with eagle by @byshiue in #11942
[https://nvbugs/5819048][fix] unwaive test of qwen3-235b eagle3 by @byshiue in #11969
[None][feat] Avoid duplicated computation with ADP + Helix CP in GQA by @brb-nv in #11891
[https://nvbugs/5624818][fix] Add unittest for GPT-OSS non-paged_context_fmha by @pengbowang-nv in #11415
[#10245][feat] AutoDeploy: Support Finegrained FP8 quantization by @bmarimuthu-nv in #10897
[TRTLLM-11284][infra] Move large models test to post-merge by @EmmaQiaoCh in #11933
[TRTLLM-11155][infra] Run multi-GPU tests even single-GPU tests are failed when use --disable-fail-fast by @yiqingy0 in #11740
[None][fix] Refine tests/unittest/_torch/flashinfer/test_trtllm_flashinfer_symbol_collision.py to reduce jit-compile time by @yihwang-nv in #11890
[#11422][feat] AutoDeploy: Piecewise cudagraph support Prototype by @nvchenghaoz in #11515
[TRTLLM-11189][fix] VisualGen isolated TeaCache Wan fix by @o-stoner in #11964
[https://nvbugs/5846166][fix] Update Perf Triage Scripts to Fix gen_only issue by @chenfeiz0326 in #11802
[TRTLLM-11057][feat] Add Helix CP support for DSV3.2 by @brb-nv in #11507
[#2912][feat] Support Cohere Command A model by @torotoki in #11505
[TRTLLM-11259][perf] Parallel VAE harness and implementation for WAN by @NVShreyas in #11875
[#11578][feat] support multimodal image input in gRPC server by @CatherineSue in #11800
[TRTLLM-11093][feat] add 5D A2A for fused ulysses by @NVShreyas in #11787
[TRTLLM-11189][fix] Fix TeaCache broken caching for FLUX.1 and FLUX.2 by @karljang in #11868
[None][refactor] Request management in ScheduledRequests by @Funatiq in #11784
[None][perf] Add Triton FP8 blockwise quant kernel and autotuner bucket-skip for visual gen by @chang-l in #11854
[TRTLLM-11290][feat] Enable trtllm-serve E2E tests by @JunyiXu-nv in #11985
[None][feat] Optimize by fuse nvfp4_quant to layernorm_gated for mamba2_mixer by @Wanli-Jiang in #11473
[None][chore] Autodeploy: add models for sprint by @nvchenghaoz in #11999
[None][infra] Update CI allow list 20260305 by @yuanjingx87 in #11965
[None][chore] Mass integration of release/1.2 weekly - 6th by @dominicshanshan in #11934
[None][fix] Fix Collect Perf Sanity Result's import requests Error by @chenfeiz0326 in #12002
[TRTLLM-10956][infra] Skip updating gitlab status for GenPostMergeBuilds by @mzweilz in #11954
[None][feat] add ReLU2 NVFP4 fusion for AutoDeploy with tests by @tcherckez-nvidia in #11957
[TRTLLM-11159][feat] Wire KVCacheBlock to UnifiedBlockTree, replacing mPrevBlock/mNextBlocks with lookup-node pointers. by @SimengLiu-nv in #11919
[#11166][infra] AutoDeploy: improve test organization in CI and add overview doc by @lucaslie in #11291
[None][chore] Model update 260308 by @tcherckez-nvidia in #12011
[None][infra] Update AutoDeploy CODEOWNERS coverage by @lucaslie in #12013
[https://nvbugs/5732958][bug] Fix TestLlama4MinLatency::test_llama_allclose_to_hf failure by @nvpohanh in #10191
[None][chore] Unwaive some skip for trtllm moe backend by @leslie-fang25 in #11975
[TRTLLM-11134][feat] export VisualGen API and update doc by @zhenhuaw-me in #11911
[https://nvbugs/5823783][test] add qa test case for trust-remote-code on multinode failure by @crazydemo in #11905
[None][feat] Use max_gpu_total_bytes to control v2's capacity by @jiaganc in #11907
[TRTLLM-11342][fix] Fix FLUX.1 TeaCache polynomial coefficients and default t… by @karljang in #12007
[None][fix] Use try/except fallback for Pydantic ValidatorIterator in chat message parsing by @Wanli-Jiang in #11903
[None][infra] Unwaive 2 cases on rtx-pro-6000d by @EmmaQiaoCh in #12003
[TRTLLM-11276][chore] Expose use_python_scheduler in SchedulerConfig and add UTs/ITs for python scheduler by @lancelly in #11884
[None][infra] Waive 7 failed cases for main in post-merge 2576 by @ZhanruiSunCh in #12014
[https://nvbugs/5948878][fix] Implement workaround for ClientPayloadError by @yingguo-trt in #12018
[TRTLLM-10407][feat] Integrate CuTE DSL top-k kernel for Blackwell by @limin2021 in #11900
[TRTLLM-11148][perf] _prepare_inputs host time optimization by @hyukn in #11704
[None][test] Fix model_name starcoder_15b is not in allowed_models issue by @yufeiwu-nv in #11981
[None][infra] Waive 5 failed cases for main in post-merge 2578 by @ZhanruiSunCh in #12023
[None][chore] AutoDeploy: re-enable nvfp4 superv3 accuracy test by @galagam in #11945
[None][chore] Remove visual_gen benchmark test from YAML by @zhenhuaw-me in #12027
[None][fix] Fix the model list as it had a dup model by @tcherckez-nvidia in #12029
[https://nvbugs/5863806][fix] Fix Python string truthiness bug in FMHA cubin selection by @luyiyun1021 in #11909
[None][feat] Upgrade xgrammar from 0.1.25 to 0.1.32 by @sunnyqgg in #12016
[https://nvbugs/5924144][test] unwaive cpp/test_unit_tests.py::test_unit_tests[kernels-80] by @Funatiq in #11902
[None][chore] limit tileiras to CUDA13.1 by @tburt-nv in #12042
[None][feat] Add silu to trtllm-gen MoE by @IwakuraRein in #11663
[TRTLLM-11045][feat] Integrate SA with EAGLE3 and PARD by @cascade812 in #11878
[None][chore] waive test_visual_gen_quickstart by @tburt-nv in #12043
[None][feat] NIXL support for hybrid model cache transfer by @NVShreyas in #11608

New Contributors

@zyang-Modular made their first contribution in #11689
@slin1237 made their first contribution in #11711
@davidmlw made their first contribution in #11493
@marinayanov made their first contribution in #11530
@lazykyama made their first contribution in #9453
@capyun007 made their first contribution in #11753
@Bias92 made their first contribution in #11862
@ekou24 made their first contribution in #11242
@o-stoner made their first contribution in #11964
@torotoki made their first contribution in #11505
@IwakuraRein made their first contribution in #11663

Full Changelog: v1.3.0rc6...v1.3.0rc7