github NVIDIA/Model-Optimizer 0.37.0
ModelOpt 0.37.0 Release

latest releases: 0.42.0, 0.42.0rc2, 0.42.0rc1...
5 months ago

Deprecations

  • Deprecated ModelOpt's custom docker images. Please use the PyTorch, TensorRT-LLM, or TensorRT docker image directly or refer to the installation guide for more details.
  • Deprecated quantize_mode argument in examples/onnx_ptq/evaluate.py to support strong typing. Use engine_precision instead.
  • Deprecated TRT-LLM's TRT backend in examples/llm_ptq and examples/vlm_ptq. Tasks build and benchmark support are removed and replaced with quant. engine_dir is replaced with checkpoint_dir in examples/llm_ptq and examples/vlm_ptq. For performance evaluation, please use trtllm-bench directly.
  • The --export_fmt flag in examples/llm_ptq is removed. By default, we export to the unified Hugging Face checkpoint format.
  • Deprecated examples/vlm_eval as it depends on the deprecated TRT-LLM's TRT backend.

New Features

  • high_precision_dtype defaults to fp16 in ONNX quantization, i.e., quantized output model weights are now FP16 by default.
  • Upgraded TensorRT-LLM dependency to 1.1.0rc2.
  • Support for Phi-4-multimodal and Qwen2.5-VL quantized HF checkpoint export in examples/vlm_ptq.
  • Support storing and restoring Minitron pruning activations and scores for re-pruning without running the forward loop again.
  • Added Minitron pruning example for the Megatron-LM framework. See examples/megatron-lm for more details.

Don't miss a new Model-Optimizer release

NewReleases is sending notifications on new releases.