github NVIDIA/TensorRT-LLM v1.1.0rc3

latest release: v1.1.0rc2.post1
pre-releaseone day ago

Announcement Highlights:

  • Model Support
    • Add fp8 support for Mistral Small 3.1 (#6731)
  • Benchmark
    • add benchmark TRT flow test for MIG (#6884)
    • Mistral Small 3.1 accuracy tests (#6909)
  • Feature
    • Update TargetInfo to accommodate CP in disagg (#7224)
    • Merge add sparse exp and shared exp into local reduction (#7369)
    • Support NVFP4 KV Cache (#6244)
    • Allocate MoE workspace only when necessary (release/1.0 retargeted) (#6955)
    • Implement capturable drafting loops for speculation (#7100)
    • Revert phi4-mm aggregate mode (#6907)
    • Complete the last missing allreduce op in Llama3/4. (#6850)
  • Documentation
    • Exposing the ADP balance strategy tech blog (#7380)
    • Update Dynasor paper info (#7137)
    • Add docs for Gemma3 VLMs (#6880)
    • add legacy section for tensorrt engine (#6724)
    • Update DeepSeek example doc (#7358)

What's Changed

New Contributors

Full Changelog: v1.1.0rc2...v1.1.0rc3

Don't miss a new TensorRT-LLM release

NewReleases is sending notifications on new releases.