NVIDIA/TensorRT-LLM v1.1.0rc2.post1 on GitHub

Announcement Highlights:

API
- Update TargetInfo to accommodate CP in disagg (#7224)
Benchmark
- Minor fixes to slurm and benchmark scripts (#7453)
Feature
- Support DeepGEMM swap-AB on sm100 (#7355)
- Merge add sparse exp and shared exp into local re… (#7422)
- Add batch waiting when scheduling (#7287)
- Reuse pytorch memory segments occupied by cudagraph pool (#7457)
- Complete the last missing allreduce op in Llama3/4 (#7420)
Documentation
- Exposing the ADP balance strategy tech blog (#7380)
- Update Dynasor paper info (#7137)
- store blog 10 media via lfs (#7375)

What's Changed

[None][doc] Exposing the ADP balance strategy tech blog by @juney-nvidia in #7380
[None][feat] Update TargetInfo to accommodate CP in disagg by @brb-nv in #7224
[None][docs] Update Dynasor paper info by @AndyDai-nv in #7137
[None] [fix] store blog 10 media via lfs by @Funatiq in #7375
[TRTLLM-7250][fix] Add failed cases into waives.txt by @xinhe-nv in #7342
[None][chore] bump version to 1.1.0rc2.post1 by @litaotju in #7396
[TRTLLM-6747][feat] Merge add sparse exp and shared exp into local re… by @zongfeijing in #7422
[None] [fix] Fix nsys in slurm scripts by @kaiyux in #7409
[None][feat] Support DeepGEMM swap-AB on sm100 by @Barry-Delaney in #7355
[None] [fix] Minor fixes to slurm and benchmark scripts by @kaiyux in #7453
[None][fix] Fix possible mpi broadcast and gather issue on large object by @dongxuy04 in #7507
[TRTLLM-7008][fix] Add automatic shared memory delete if already exist by @dongxuy04 in #7377
[None][ci] Cherry-pick some improvements for Slurm CI setup from main branch by @chzblych in #7479
[https://nvbugs/5481434][feat] Reuse pytorch memory segments occupied by cudagraph pool by @HuiGao-NV in #7457
[None][fix] Update DG side branch name by @Barry-Delaney in #7491
[None][fix] Update DG commit by @Barry-Delaney in #7534
[None][fix] Fix a typo in the Slurm CI codes (#7485) by @chzblych in #7538
[https://nvbugs/5488582][fix] Avoid unexpected Triton recompilation in DG fused_moe. by @hyukn in #7495
[None][fix] Cherry-pick 6850: Complete the last missing allreduce op in Llama3/4. by @hyukn in #7420
[None][opt] Add batch waiting when scheduling by @yunruis in #7287
[https://nvbugs/5485325][fix] Add a postprocess to the model engine to fix the CUDA graph warmup issue when using speculative decoding by @lfr-0531 in #7373
[None][fix] Cherry-Pick MNNVLAllreduce Fixes into release/1.1.0rc2 branch by @timlee0212 in #7487

New Contributors

@AndyDai-nv made their first contribution in #7137

Full Changelog: v1.1.0rc2...v1.1.0rc2.post1