github NVIDIA/Model-Optimizer 0.25.0
ModelOpt 0.25.0 Release

latest releases: 0.43.0rc2, 0.43.0rc1, 0.44.0dev...
13 months ago

Deprecations

  • Deprecate Torch 2.1 support.
  • Deprecate humaneval benchmark in llm_eval examples. Please use the newly added simple_eval instead.
  • Deprecate fp8_naive quantization format in llm_ptq examples. Please use fp8 instead.

New Features

  • Support fast hadamard transform in TensorQuantizer class (modelopt.torch.quantization.nn.modules.TensorQuantizer).
    It can be used for rotation based quantization methods, e.g. QuaRot. Users need to install the package fast_hadamard_transfrom to use this feature.
  • Add affine quantization support for the KV cache, resolving the low accuracy issue in models such as Qwen2.5 and Phi-3/3.5.
  • Add FSDP2 support. FSDP2 can now be used for QAT.
  • Add LiveCodeBench and Simple Evals to the llm_eval examples.
  • Disabled saving modelopt state in unified hf export APIs by default, i.e., added save_modelopt_state flag in export_hf_checkpoint API and by default set to False.
  • Add FP8 and NVFP4 real quantization support with LLM QLoRA example.
  • The modelopt.deploy.llm.LLM class now support use the tensorrt_llm._torch.LLM backend for the quantized HuggingFace checkpoints.
  • Add NVFP4 PTQ example for DeepSeek-R1.
  • Add end-to-end AutoDeploy example for AutoQuant LLM models.

Don't miss a new Model-Optimizer release

NewReleases is sending notifications on new releases.