github NVIDIA/TensorRT-LLM v1.2.0rc6.post3

pre-release13 hours ago

What's Changed

  • [https://nvbugs/5850094][fix] Fix MoE cost estimation for auto multi-stream scheduling by @yizhang-nv in #11160
  • [None][feat] update TRT-LLM Gen DS FP8 MoE cubins and optimize finalize kernel by @nekorobov in #11104
  • [None][chore] Bump version to 1.2.0rc6.post3 by @yiqingy0 in #11224
  • [None][fix] Fallback to NCCL instead of NCCL symmetric by @Tabrizian in #11174
  • [None][feat] fuse shared to sparse experts in TRT-LLM Gen MoE by @nekorobov in #11143

Full Changelog: v1.2.0rc6.post2...v1.2.0rc6.post3

Don't miss a new TensorRT-LLM release

NewReleases is sending notifications on new releases.