Announcement Highlights
- Feature
- Add MNNVL AlltoAll tests to pre-merge (#7465)
- Support multi-threaded tokenizers for trtllm-serve (#7515)
- FP8 Context MLA integration (#7581)
- Support block wise FP8 in wide ep (#7423)
- Cherry-pick Responses API and multiple postprocess workers support for chat harmony (#7600)
- Make low_precision_combine as a llm arg (#7598)
- Documentation
- Update deployment guide and cherry-pick CI test fix from main (#7623)
What's Changed
- [None] [test] Add MNNVL AlltoAll tests to pre-merge by @kaiyux in #7465
- [TRTLLM-7292][feat] Support multi-threaded tokenizers for trtllm-serve by @nv-yilinf in #7515
- [None][fix] trtllm-serve yaml loading by @Superjomn in #7551
- [None][chore] Bump version to 1.1.0rc2.post2 by @yiqingy0 in #7582
- [https://nvbugs/5498967][fix] Downgrade NCCL by @yizhang-nv in #7556
- [TRTLLM-6994][feat] FP8 Context MLA integration. by @yuxianq in #7581
- [TRTLLM-7831][feat] Support block wise FP8 in wide ep by @xxi-nv in #7423
- [None][chore] Make use_low_precision_moe_combine as a llm arg by @zongfeijing in #7598
- [None][fix] Update deployment guide and cherry-pick CI test fix from main by @dongfengy in #7623
- [None][feat] Cherry-pick Responses API and multiple postprocess workers support for chat harmony by @JunyiXu-nv in #7600
- [None][chore] Fix kernel launch param and add TRTLLM MoE backend test by @pengbowang-nv in #7524
New Contributors
Full Changelog: v1.1.0rc2.post1...v1.1.0rc2.post2