NVIDIA/TensorRT-LLM v1.1.0rc2.post2 on GitHub

Announcement Highlights

Feature
- Add MNNVL AlltoAll tests to pre-merge (#7465)
- Support multi-threaded tokenizers for trtllm-serve (#7515)
- FP8 Context MLA integration (#7581)
- Support block wise FP8 in wide ep (#7423)
- Cherry-pick Responses API and multiple postprocess workers support for chat harmony (#7600)
- Make low_precision_combine as a llm arg (#7598)
Documentation
- Update deployment guide and cherry-pick CI test fix from main (#7623)

[None] [test] Add MNNVL AlltoAll tests to pre-merge by @kaiyux in #7465
[TRTLLM-7292][feat] Support multi-threaded tokenizers for trtllm-serve by @nv-yilinf in #7515
[None][fix] trtllm-serve yaml loading by @Superjomn in #7551
[None][chore] Bump version to 1.1.0rc2.post2 by @yiqingy0 in #7582
[https://nvbugs/5498967][fix] Downgrade NCCL by @yizhang-nv in #7556
[TRTLLM-6994][feat] FP8 Context MLA integration. by @yuxianq in #7581
[TRTLLM-7831][feat] Support block wise FP8 in wide ep by @xxi-nv in #7423
[None][chore] Make use_low_precision_moe_combine as a llm arg by @zongfeijing in #7598
[None][fix] Update deployment guide and cherry-pick CI test fix from main by @dongfengy in #7623
[None][feat] Cherry-pick Responses API and multiple postprocess workers support for chat harmony by @JunyiXu-nv in #7600
[None][chore] Fix kernel launch param and add TRTLLM MoE backend test by @pengbowang-nv in #7524

Full Changelog: v1.1.0rc2.post1...v1.1.0rc2.post2