Announcement Highlights:
- Model Support
- Add fp8 support for Mistral Small 3.1 (#6731)
- Benchmark
- Feature
- Update TargetInfo to accommodate CP in disagg (#7224)
- Merge add sparse exp and shared exp into local reduction (#7369)
- Support NVFP4 KV Cache (#6244)
- Allocate MoE workspace only when necessary (release/1.0 retargeted) (#6955)
- Implement capturable drafting loops for speculation (#7100)
- Revert phi4-mm aggregate mode (#6907)
- Complete the last missing allreduce op in Llama3/4. (#6850)
- Documentation
What's Changed
- [None][doc] Exposing the ADP balance strategy tech blog by @juney-nvidia in #7380
- [None][feat] Update TargetInfo to accommodate CP in disagg by @brb-nv in #7224
- [None][docs] Update Dynasor paper info by @AndyDai-nv in #7137
- [None] [fix] store blog 10 media via lfs by @Funatiq in #7375
- [TRTLLM-7250][fix] Add failed cases into waives.txt by @xinhe-nv in #7342
- [None][chore] Bump version to 1.1.0rc3 by @yiqingy0 in #7394
- [TRTLLM-6747][feat] Merge add sparse exp and shared exp into local reduction by @zongfeijing in #7369
- [None][feat] Support NVFP4 KV Cache by @Tom-Zheng in #6244
- [None][ci] Some improvements for Slurm CI setup by @chzblych in #7407
- [None][chore] Mass integration of release/1.0 - 2nd by @dominicshanshan in #7171
- [None][test] Update case that not support passing quantization fp8 for pytorch backend by @nvamyt in #7302
- [None][infra] Disable GB200-PyTorch-1 due to OOM issue by @yuanjingx87 in #7386
- [https://nvbugs/5481087][fix] fix bug of ci when we use mocker by @byshiue in #7332
- [None][infra] Waive failed case on main 0901 by @EmmaQiaoCh in #7447
- [TRTLLM-7353][feat] Implement capturable drafting loops for speculation by @mikeiovine in #7100
- [None] [doc] Update DeepSeek example doc by @jiahanc in #7358
- [None][fix] Fix nanobind failure by @Tom-Zheng in #7425
- [None][chore] Use llm args in create_py_executor by @leslie-fang25 in #7239
New Contributors
- @AndyDai-nv made their first contribution in #7137
Full Changelog: v1.1.0rc2...v1.1.0rc3