github NVIDIA/TensorRT-LLM v0.20.0rc0

latest releases: v1.2.0rc5, v1.2.0rc4, v1.2.0rc3...
pre-release7 months ago

Highlights

  • Model Support
    • Added Nemotron-H model support (#3430)
    • Added Dynasor-CoT in scaffolding examples (#3501)
  • Features
    • Added stream generation task scaffolding examples (#3527)
    • Added unfused RoPE support in MLA (#3610)
    • Multimodal models
      • Added support in trtllm-serve (#3590)
      • Added support in trtllm-bench, the support is limited to image only for now (#3490)
    • [Experimental] The TensorRT-LLM Triton backend has supported the LLM API (triton-inference-server/tensorrtllm_backend#742)
  • Performance
    • Optimized Large Embedding Tables in Multimodal Models (#3380)
  • Infra
    • Dependent datasets version was upgraded to 3.1.0 (#3490)

What's Changed

New Contributors

Full Changelog: v0.19.0rc0...v0.20.0rc0

Don't miss a new TensorRT-LLM release

NewReleases is sending notifications on new releases.