NVIDIA/TensorRT-LLM v1.1.0rc3 on GitHub

Announcement Highlights:

Model Support
- Add fp8 support for Mistral Small 3.1 (#6731)
Benchmark
- add benchmark TRT flow test for MIG (#6884)
- Mistral Small 3.1 accuracy tests (#6909)
Feature
- Update TargetInfo to accommodate CP in disagg (#7224)
- Merge add sparse exp and shared exp into local reduction (#7369)
- Support NVFP4 KV Cache (#6244)
- Allocate MoE workspace only when necessary (release/1.0 retargeted) (#6955)
- Implement capturable drafting loops for speculation (#7100)
- Revert phi4-mm aggregate mode (#6907)
- Complete the last missing allreduce op in Llama3/4. (#6850)
Documentation
- Exposing the ADP balance strategy tech blog (#7380)
- Update Dynasor paper info (#7137)
- Add docs for Gemma3 VLMs (#6880)
- add legacy section for tensorrt engine (#6724)
- Update DeepSeek example doc (#7358)

[None][doc] Exposing the ADP balance strategy tech blog by @juney-nvidia in #7380
[None][feat] Update TargetInfo to accommodate CP in disagg by @brb-nv in #7224
[None][docs] Update Dynasor paper info by @AndyDai-nv in #7137
[None] [fix] store blog 10 media via lfs by @Funatiq in #7375
[TRTLLM-7250][fix] Add failed cases into waives.txt by @xinhe-nv in #7342
[None][chore] Bump version to 1.1.0rc3 by @yiqingy0 in #7394
[TRTLLM-6747][feat] Merge add sparse exp and shared exp into local reduction by @zongfeijing in #7369
[None][feat] Support NVFP4 KV Cache by @Tom-Zheng in #6244
[None][ci] Some improvements for Slurm CI setup by @chzblych in #7407
[None][chore] Mass integration of release/1.0 - 2nd by @dominicshanshan in #7171
[None][test] Update case that not support passing quantization fp8 for pytorch backend by @nvamyt in #7302
[None][infra] Disable GB200-PyTorch-1 due to OOM issue by @yuanjingx87 in #7386
[https://nvbugs/5481087][fix] fix bug of ci when we use mocker by @byshiue in #7332
[None][infra] Waive failed case on main 0901 by @EmmaQiaoCh in #7447
[TRTLLM-7353][feat] Implement capturable drafting loops for speculation by @mikeiovine in #7100
[None] [doc] Update DeepSeek example doc by @jiahanc in #7358
[None][fix] Fix nanobind failure by @Tom-Zheng in #7425
[None][chore] Use llm args in create_py_executor by @leslie-fang25 in #7239

Full Changelog: v1.1.0rc2...v1.1.0rc3