huggingface/optimum-intel v1.23.0 on GitHub

🚀 New Features & Enhancements

OpenVINO

Add MAIRA-2 support by @eaidova in #1145
Add support for nf4_f8e4m3 quantization mode by @nikita-savelyevv in #1148
Add DeepSeek support by @eaidova in #1155
Add Qwen2.5-VL support by @eaidova in #1163
Add LLaVA-Next-Video support by @eaidova in #1183
Add GOT-OCR2 support by @eaidova in #1202
Add Gemma 3 support by @eaidova in #1198
Add SmolVLM and Idefics3 support by @eaidova in #1210
Add Phi-3-MoE support by @eaidova in #1215
Add OVSamModel for inference by @eaidova in #1229
Add Phi-4-multimodal support by @eaidova in #1201
Add Llama 4 support by @eaidova in #1226
Add zero-shot-Image-classification support by @eaidova in #1273
Add PTQ support for OVModelForZeroShotImageClassification by @nikita-savelyevv in #1283
Add diffuers full int8 quantization Support by @l-bat in #1193
Add SANA-Sprint support by @eaidova in #1245
Add PTQ support for OVModelForMaskedLM by @nikita-savelyevv in #1268
Add LTX-Video support by @eaidova in #1264
Add Qwen3 and Qwen3-MOE support by @openvino-dev-samples in #1214
Add SpeechT5 text-to-speech support by OpenVINO by @rkazants in #1230
Add GLM4 support by @openvino-dev-samples in #1249
PTQ support for OVModelForFeatureExtraction and OVSentenceTransformer by @nikita-savelyevv in #1257
Introduce OVCalibrationDatasetBuilder by @nikita-savelyevv in #1232

IPEX

Add Qwen2 support by @jiqing-feng in #1107
Enable quantization model support by @jiqing-feng in #1074
Add support for flash decoding on xpu by @kaixuanliu in #1118
Add Phi support by @jiqing-feng in #1175
Enable compilation for patched model with paged attention by @jiqing-feng in #1253
Add Mistral modeling optimization support for ipex by @kaixuanliu in #1269

Transformers compatibility

Add compatibility with transformers v4.49 by @echarlaix in #1172
Add compatibility with transformers v4.50 and v4.51 by @IlyasMoutawwakil in #1242

🔧 Key Fixes & Optimizations

Fix misplaced configs saving by @eaidova in #1159
Check if nncf is installed before running quantization from optimum-cli by @nikita-savelyevv in #1154
Fix automatic-speech-recognition-with-past quantization from CLI by @nikita-savelyevv in #1180
Propagate OV QuantizationConfig kwargs to nncf calls by @nikita-savelyevv in #1179
Fix model field names for OVBaseModelForSeq2SeqLM by @nikita-savelyevv in #1184
Align loading dtype logic for diffusers with other models by @eaidova in #1187
Fix generation for statically reshaped diffusion pipeline by @eaidova in #1199
Add ov_submodels property to OVBaseModel by @nikita-savelyevv in #1177
Fix flux and sana export with diffusers 0.33+ by @eaidova in #1236
Update pkv precision at save_pretrained call by @nikita-savelyevv in #1235
Remove ONNX fallback when converting to OpenVINO by @eaidova in #1272
Fix custom dataset processing for text encoding tasks by @nikita-savelyevv in #1286
Fix openvino decoder models output by @echarlaix in #1308

What's Changed

fix export phi3 with --trust-remote-code by @eaidova in #1147
Skip test_aware_training_quantization test by @nikita-savelyevv in #1149
Check if nncf is installed before running quantization from optimum-cli by @nikita-savelyevv in #1154
enable qwen2 model by @jiqing-feng in #1107
maira2 support by @eaidova in #1145
Add slow tests for lower transformers version by @echarlaix in #1144
fix misplaced configs saving by @eaidova in #1159
Add default int4 config for DeepSeek-R1-Distill-Llama-8B by @nikita-savelyevv in #1158
Remove unnecessary SD reload from saved dir by @l-bat in #1162
resolve complicated chat templates during tokenizer saving by @eaidova in #1151
Trigger tests for maira2 for compatible transformers version by @echarlaix in #1161
use Tensor.numpy() instead np.array(Tensor) by @eaidova in #1153
[OV] Add support for nf4_f8e4m3 quantization mode by @nikita-savelyevv in #1148
support updated chat template for llava-next by @eaidova in #1166
avoid extra reshaping to max_model_lenght for unet by @eaidova in #1164
Enable quant model support by @jiqing-feng in #1074
[OV] Add default int4 configurations for DeepSeek-R1-Distill-Qwen models by @nikita-savelyevv in #1168
Deprecate OVTrainer by @nikita-savelyevv in #1167
Support deeepseek models export by @eaidova in #1155
add support for flash decoding on xpu by @kaixuanliu in #1118
deprecate TSModelForCausalLM by @echarlaix in #1173
transformers 4.49 by @echarlaix in #1172
Update ipex Ci to torch 2.6 by @jiqing-feng in #1176
add support qwen2.5vl by @eaidova in #1163
enable phi by @jiqing-feng in #1175
Add ov_submodels property to OVBaseModel by @nikita-savelyevv in #1177
[OV] Fix automatic-speech-recognition-with-past quantization from CLI by @nikita-savelyevv in #1180
Propagate OV*QuantizationConfig kwargs to nncf calls by @nikita-savelyevv in #1179
[OV] Add int4 config for Llama-3.1-8b model id aliases by @nikita-savelyevv in #1182
Fix model field names for OVBaseModelForSeq2SeqLM by @nikita-savelyevv in #1184
[OV] Enable back phi3_v 4bit compression test by @nikita-savelyevv in #1185
align loading dtype logic for diffusers with other models by @eaidova in #1187
attempt to resolve 4.49 compatibility issues and fix input processing… by @eaidova in #1190
fix logits_to_keep by @jiqing-feng in #1188
warm up do not work for compiled model by @jiqing-feng in #1189
Add default int4 configs for Phi-4-mini-instruct and Qwen2.5-7B-Instruct by @nikita-savelyevv in #1194
add support llava-next-video by @eaidova in #1183
upgrade transformers to 4.49 for patching models by @jiqing-feng in #1196
add support got-ocr2 by @eaidova in #1202
fix generation for statically reshaped diffusion pipeline by @eaidova in #1199
add gemma3 support by @eaidova in #1198
enable awq tests by @jiqing-feng in #1195
fix tests running with nightly by @eaidova in #1205
fix internvl2 patching for transformes>=4.48 by @eaidova in #1206
Support full int8 quantization for diffusers by @l-bat in #1193
Update default int4 config for llama-2-7b-chat-hf by @nikita-savelyevv in #1216
Patch by @jiqing-feng in #1200
add falcon3 simplified chat template by @eaidova in #1217
add support SmolVLM and Idefics3 models by @eaidova in #1210
fix checking available files if from_onnx=True by @eaidova in #1208
remove XPULinearXXX class definition for ipex by @kaixuanliu in #1212
phi3moe support by @eaidova in #1215
Fix crash issue of IPEX XPU's rotary_embedding API by @kaixuanliu in #1218
Add simplified chat template for Mistral-7B-Instruct-v0.3 by @sbalandi in #1221
Add OpenVINO Optimization Support Matrix table by @nikita-savelyevv in #1219
Fix tests by @IlyasMoutawwakil in #1222
[OV, Docs] Dark mode compatibility of optimization support matrix by @nikita-savelyevv in #1225
[OV] INT4 configs for Qwen2.5-1.5B-Instruct and Llama-3.2-1B-Instruct by @nikita-savelyevv in #1231
fix flux and sana export with diffusers 0.33+ by @eaidova in #1236
Fix more ci by @IlyasMoutawwakil in #1228
move sentencepiece to test requirements for unblocking python3.13 by @eaidova in #1240
Update pkv precision at save_pretrained call by @nikita-savelyevv in #1235
Install diffusers requirement during OV Full and Slow tests by @nikita-savelyevv in #1243
Add nncf version as openvino model runtime flag by @nikita-savelyevv in #1244
add support sana-sprint by @eaidova in #1245
Replace compvis safety checker model with a tiny version by @nikita-savelyevv in #1250
restore cache_position input in whisper by @eaidova in #1254
Introduce OVCalibrationDatasetBuilder (part 1/2) by @nikita-savelyevv in #1232
Update transformers by @IlyasMoutawwakil in #1242
fix typo in unpatching decoder models by @eaidova in #1259
Enable GLM4 for openvino by @openvino-dev-samples in #1249
Fix INC CI tests failed by @changwangss in #1262
[OV] Update --sym argument description by @nikita-savelyevv in #1263
PTQ support for OVModelForFeatureExtraction and OVSentenceTransformer by @nikita-savelyevv in #1257
Fix ipex CI by @jiqing-feng in #1260
add OVSamModel for inference by @eaidova in #1229
Fix documentation by @echarlaix in #1266
PTQ support for OVModelForMaskedLM by @nikita-savelyevv in #1268
upgrade transformers for ipex by @kaixuanliu in #1267
Clean ipex tests by @jiqing-feng in #1270
support LTX-video by @eaidova in #1264
fix gptj export for transformers>4.49 by @eaidova in #1271
Support SpeechT5 text-to-speech pipeline by OpenVINO by @rkazants in #1230
Enable Qwen3 and Qwen3-MOE for OpenVINO by @openvino-dev-samples in #1214
Add Mistral modeling optimization support for ipex by @kaixuanliu in #1269
Avoid use of deprecated openvino.runtime by @rkazants in #1274
upgrade ipex to 2.7 by @jiqing-feng in #1277
support Zero-shot-Image-Classification by @eaidova in #1273
fix bug when bs > 1 and do not provide position_ids for Mistral model by @kaixuanliu in #1276
Fix minimum diffusers version for LTXPipeline by @echarlaix in #1279
removal onnx fallback by @eaidova in #1272
Replace deprecated openvino.runtime by @echarlaix in #1280
int4 configs for Qwen3-1.7B and Qwen3-4B by @MaximProshin in #1282
int4 config for starcoder2-15b by @MaximProshin in #1285
Fix spaces in chat_template simplification for Mistral-7B-Instruct-v0.3 by @Wovchena in #1281
Enable compile for ipex patched model with paged attention by @jiqing-feng in #1253
Fix typos by @omahs in #1284
int4 config for Qwen/Qwen3-8B by @MaximProshin in #1287
add support phi4 multimodal by @eaidova in #1201
Fix custom dataset processing for text encoding tasks by @nikita-savelyevv in #1286
llama4 by @eaidova in #1226
Default config for microsoft/Phi-4-multimodal-instruct by @nikita-savelyevv in #1289
Remove OVBaseModelForSeq2SeqLM by @echarlaix in #1278
Add compression tests for phi4mm by @nikita-savelyevv in #1292
Update tiny model for ipex lanchain tests by @echarlaix in #1296
PTQ support for OVModelForZeroShotImageClassification by @nikita-savelyevv in #1283
add token classification for qwen2 by @eaidova in #1299
fix beam search test for latest optimum by @eaidova in #1290
fix automatic task detection for phi4-multimodal during data-aware quant by @eaidova in #1293
provide workaround for export openai whisper models by @eaidova in #1301
Introduce named test reference values per submodel by @nikita-savelyevv in #1300
set optimum version in setup by @echarlaix in #1298
add gemma to skip check trace models by @echarlaix in #1306
remove forcing input_ids by @eaidova in #1307
Fixes openvino decoder models output by @echarlaix in #1308
Make a WA to avoid XPU crash for API PagedAttention.reshape_and_cache_flash by @kaixuanliu in #1288

huggingface/optimum-intel v1.23.0 v1.23.0: DeepSeek, Llama 4, LTX-Video on GitHub

🚀 New Features & Enhancements

OpenVINO

IPEX

Transformers compatibility

🔧 Key Fixes & Optimizations

What's Changed

huggingface/optimum-intel v1.23.0
v1.23.0: DeepSeek, Llama 4, LTX-Video

on GitHub