NVIDIA/Model-Optimizer 0.25.0 on GitHub

Deprecate Torch 2.1 support.
Deprecate humaneval benchmark in llm_eval examples. Please use the newly added simple_eval instead.
Deprecate fp8_naive quantization format in llm_ptq examples. Please use fp8 instead.

Support fast hadamard transform in TensorQuantizer class (modelopt.torch.quantization.nn.modules.TensorQuantizer).
It can be used for rotation based quantization methods, e.g. QuaRot. Users need to install the package fast_hadamard_transfrom to use this feature.
Add affine quantization support for the KV cache, resolving the low accuracy issue in models such as Qwen2.5 and Phi-3/3.5.
Add FSDP2 support. FSDP2 can now be used for QAT.
Add LiveCodeBench and Simple Evals to the llm_eval examples.
Disabled saving modelopt state in unified hf export APIs by default, i.e., added save_modelopt_state flag in export_hf_checkpoint API and by default set to False.
Add FP8 and NVFP4 real quantization support with LLM QLoRA example.
The modelopt.deploy.llm.LLM class now support use the tensorrt_llm._torch.LLM backend for the quantized HuggingFace checkpoints.
Add NVFP4 PTQ example for DeepSeek-R1.
Add end-to-end AutoDeploy example for AutoQuant LLM models.

NVIDIA/Model-Optimizer 0.25.0 ModelOpt 0.25.0 Release on GitHub