Deprecations
- Deprecate Torch 2.1 support.
- Deprecate
humanevalbenchmark inllm_evalexamples. Please use the newly addedsimple_evalinstead. - Deprecate
fp8_naivequantization format inllm_ptqexamples. Please usefp8instead.
New Features
- Support fast hadamard transform in
TensorQuantizerclass (modelopt.torch.quantization.nn.modules.TensorQuantizer).
It can be used for rotation based quantization methods, e.g. QuaRot. Users need to install the package fast_hadamard_transfrom to use this feature. - Add affine quantization support for the KV cache, resolving the low accuracy issue in models such as Qwen2.5 and Phi-3/3.5.
- Add FSDP2 support. FSDP2 can now be used for QAT.
- Add LiveCodeBench and Simple Evals to the
llm_evalexamples. - Disabled saving modelopt state in unified hf export APIs by default, i.e., added
save_modelopt_stateflag inexport_hf_checkpointAPI and by default set to False. - Add FP8 and NVFP4 real quantization support with LLM QLoRA example.
- The
modelopt.deploy.llm.LLMclass now support use thetensorrt_llm._torch.LLMbackend for the quantized HuggingFace checkpoints. - Add NVFP4 PTQ example for DeepSeek-R1.
- Add end-to-end AutoDeploy example for AutoQuant LLM models.