github NVIDIA/TensorRT-LLM v1.1.0rc2.post1

pre-release5 hours ago

Announcement Highlights:

  • API
    • Update TargetInfo to accommodate CP in disagg (#7224)
  • Benchmark
    • Minor fixes to slurm and benchmark scripts (#7453)
  • Feature
    • Support DeepGEMM swap-AB on sm100 (#7355)
    • Merge add sparse exp and shared exp into local re… (#7422)
    • Add batch waiting when scheduling (#7287)
    • Reuse pytorch memory segments occupied by cudagraph pool (#7457)
    • Complete the last missing allreduce op in Llama3/4 (#7420)
  • Documentation
    • Exposing the ADP balance strategy tech blog (#7380)
    • Update Dynasor paper info (#7137)
    • store blog 10 media via lfs (#7375)

What's Changed

  • [None][doc] Exposing the ADP balance strategy tech blog by @juney-nvidia in #7380
  • [None][feat] Update TargetInfo to accommodate CP in disagg by @brb-nv in #7224
  • [None][docs] Update Dynasor paper info by @AndyDai-nv in #7137
  • [None] [fix] store blog 10 media via lfs by @Funatiq in #7375
  • [TRTLLM-7250][fix] Add failed cases into waives.txt by @xinhe-nv in #7342
  • [None][chore] bump version to 1.1.0rc2.post1 by @litaotju in #7396
  • [TRTLLM-6747][feat] Merge add sparse exp and shared exp into local re… by @zongfeijing in #7422
  • [None] [fix] Fix nsys in slurm scripts by @kaiyux in #7409
  • [None][feat] Support DeepGEMM swap-AB on sm100 by @Barry-Delaney in #7355
  • [None] [fix] Minor fixes to slurm and benchmark scripts by @kaiyux in #7453
  • [None][fix] Fix possible mpi broadcast and gather issue on large object by @dongxuy04 in #7507
  • [TRTLLM-7008][fix] Add automatic shared memory delete if already exist by @dongxuy04 in #7377
  • [None][ci] Cherry-pick some improvements for Slurm CI setup from main branch by @chzblych in #7479
  • [https://nvbugs/5481434][feat] Reuse pytorch memory segments occupied by cudagraph pool by @HuiGao-NV in #7457
  • [None][fix] Update DG side branch name by @Barry-Delaney in #7491
  • [None][fix] Update DG commit by @Barry-Delaney in #7534
  • [None][fix] Fix a typo in the Slurm CI codes (#7485) by @chzblych in #7538
  • [https://nvbugs/5488582][fix] Avoid unexpected Triton recompilation in DG fused_moe. by @hyukn in #7495
  • [None][fix] Cherry-pick 6850: Complete the last missing allreduce op in Llama3/4. by @hyukn in #7420
  • [None][opt] Add batch waiting when scheduling by @yunruis in #7287
  • [https://nvbugs/5485325][fix] Add a postprocess to the model engine to fix the CUDA graph warmup issue when using speculative decoding by @lfr-0531 in #7373
  • [None][fix] Cherry-Pick MNNVLAllreduce Fixes into release/1.1.0rc2 branch by @timlee0212 in #7487

New Contributors

Full Changelog: v1.1.0rc2...v1.1.0rc2.post1

Don't miss a new TensorRT-LLM release

NewReleases is sending notifications on new releases.