🚀 New Features & Enhancements
OpenVINO
- Add MAIRA-2 support by @eaidova in #1145
- Add support for
nf4_f8e4m3
quantization mode by @nikita-savelyevv in #1148 - Add DeepSeek support by @eaidova in #1155
- Add Qwen2.5-VL support by @eaidova in #1163
- Add LLaVA-Next-Video support by @eaidova in #1183
- Add GOT-OCR2 support by @eaidova in #1202
- Add Gemma 3 support by @eaidova in #1198
- Add SmolVLM and Idefics3 support by @eaidova in #1210
- Add Phi-3-MoE support by @eaidova in #1215
- Add OVSamModel for inference by @eaidova in #1229
- Add Phi-4-multimodal support by @eaidova in #1201
- Add Llama 4 support by @eaidova in #1226
- Add zero-shot-Image-classification support by @eaidova in #1273
- Add PTQ support for OVModelForZeroShotImageClassification by @nikita-savelyevv in #1283
- Add diffuers full int8 quantization Support by @l-bat in #1193
- Add SANA-Sprint support by @eaidova in #1245
- Add PTQ support for OVModelForMaskedLM by @nikita-savelyevv in #1268
- Add LTX-Video support by @eaidova in #1264
- Add Qwen3 and Qwen3-MOE support by @openvino-dev-samples in #1214
- Add SpeechT5 text-to-speech support by OpenVINO by @rkazants in #1230
- Add GLM4 support by @openvino-dev-samples in #1249
- PTQ support for OVModelForFeatureExtraction and OVSentenceTransformer by @nikita-savelyevv in #1257
- Introduce OVCalibrationDatasetBuilder by @nikita-savelyevv in #1232
IPEX
- Add Qwen2 support by @jiqing-feng in #1107
- Enable quantization model support by @jiqing-feng in #1074
- Add support for flash decoding on xpu by @kaixuanliu in #1118
- Add Phi support by @jiqing-feng in #1175
- Enable compilation for patched model with paged attention by @jiqing-feng in #1253
- Add Mistral modeling optimization support for ipex by @kaixuanliu in #1269
Transformers compatibility
- Add compatibility with transformers v4.49 by @echarlaix in #1172
- Add compatibility with transformers v4.50 and v4.51 by @IlyasMoutawwakil in #1242
🔧 Key Fixes & Optimizations
- Fix misplaced configs saving by @eaidova in #1159
- Check if nncf is installed before running quantization from optimum-cli by @nikita-savelyevv in #1154
- Fix automatic-speech-recognition-with-past quantization from CLI by @nikita-savelyevv in #1180
- Propagate OV QuantizationConfig kwargs to nncf calls by @nikita-savelyevv in #1179
- Fix model field names for OVBaseModelForSeq2SeqLM by @nikita-savelyevv in #1184
- Align loading dtype logic for diffusers with other models by @eaidova in #1187
- Fix generation for statically reshaped diffusion pipeline by @eaidova in #1199
- Add
ov_submodels
property toOVBaseModel
by @nikita-savelyevv in #1177 - Fix flux and sana export with diffusers 0.33+ by @eaidova in #1236
- Update pkv precision at save_pretrained call by @nikita-savelyevv in #1235
- Remove ONNX fallback when converting to OpenVINO by @eaidova in #1272
- Fix custom dataset processing for text encoding tasks by @nikita-savelyevv in #1286
- Fix openvino decoder models output by @echarlaix in #1308
What's Changed
- fix export phi3 with --trust-remote-code by @eaidova in #1147
- Skip test_aware_training_quantization test by @nikita-savelyevv in #1149
- Check if nncf is installed before running quantization from optimum-cli by @nikita-savelyevv in #1154
- enable qwen2 model by @jiqing-feng in #1107
- maira2 support by @eaidova in #1145
- Add slow tests for lower transformers version by @echarlaix in #1144
- fix misplaced configs saving by @eaidova in #1159
- Add default int4 config for DeepSeek-R1-Distill-Llama-8B by @nikita-savelyevv in #1158
- Remove unnecessary SD reload from saved dir by @l-bat in #1162
- resolve complicated chat templates during tokenizer saving by @eaidova in #1151
- Trigger tests for maira2 for compatible transformers version by @echarlaix in #1161
- use Tensor.numpy() instead np.array(Tensor) by @eaidova in #1153
- [OV] Add support for
nf4_f8e4m3
quantization mode by @nikita-savelyevv in #1148 - support updated chat template for llava-next by @eaidova in #1166
- avoid extra reshaping to max_model_lenght for unet by @eaidova in #1164
- Enable quant model support by @jiqing-feng in #1074
- [OV] Add default int4 configurations for DeepSeek-R1-Distill-Qwen models by @nikita-savelyevv in #1168
- Deprecate OVTrainer by @nikita-savelyevv in #1167
- Support deeepseek models export by @eaidova in #1155
- add support for flash decoding on xpu by @kaixuanliu in #1118
- deprecate TSModelForCausalLM by @echarlaix in #1173
- transformers 4.49 by @echarlaix in #1172
- Update ipex Ci to torch 2.6 by @jiqing-feng in #1176
- add support qwen2.5vl by @eaidova in #1163
- enable phi by @jiqing-feng in #1175
- Add
ov_submodels
property toOVBaseModel
by @nikita-savelyevv in #1177 - [OV] Fix automatic-speech-recognition-with-past quantization from CLI by @nikita-savelyevv in #1180
- Propagate OV*QuantizationConfig kwargs to nncf calls by @nikita-savelyevv in #1179
- [OV] Add int4 config for Llama-3.1-8b model id aliases by @nikita-savelyevv in #1182
- Fix model field names for OVBaseModelForSeq2SeqLM by @nikita-savelyevv in #1184
- [OV] Enable back phi3_v 4bit compression test by @nikita-savelyevv in #1185
- align loading dtype logic for diffusers with other models by @eaidova in #1187
- attempt to resolve 4.49 compatibility issues and fix input processing… by @eaidova in #1190
- fix logits_to_keep by @jiqing-feng in #1188
- warm up do not work for compiled model by @jiqing-feng in #1189
- Add default int4 configs for Phi-4-mini-instruct and Qwen2.5-7B-Instruct by @nikita-savelyevv in #1194
- add support llava-next-video by @eaidova in #1183
- upgrade transformers to 4.49 for patching models by @jiqing-feng in #1196
- add support got-ocr2 by @eaidova in #1202
- fix generation for statically reshaped diffusion pipeline by @eaidova in #1199
- add gemma3 support by @eaidova in #1198
- enable awq tests by @jiqing-feng in #1195
- fix tests running with nightly by @eaidova in #1205
- fix internvl2 patching for transformes>=4.48 by @eaidova in #1206
- Support full int8 quantization for diffusers by @l-bat in #1193
- Update default int4 config for llama-2-7b-chat-hf by @nikita-savelyevv in #1216
- Patch by @jiqing-feng in #1200
- add falcon3 simplified chat template by @eaidova in #1217
- add support SmolVLM and Idefics3 models by @eaidova in #1210
- fix checking available files if from_onnx=True by @eaidova in #1208
- remove
XPULinearXXX
class definition for ipex by @kaixuanliu in #1212 - phi3moe support by @eaidova in #1215
- Fix crash issue of IPEX XPU's rotary_embedding API by @kaixuanliu in #1218
- Add simplified chat template for Mistral-7B-Instruct-v0.3 by @sbalandi in #1221
- Add OpenVINO Optimization Support Matrix table by @nikita-savelyevv in #1219
- Fix tests by @IlyasMoutawwakil in #1222
- [OV, Docs] Dark mode compatibility of optimization support matrix by @nikita-savelyevv in #1225
- [OV] INT4 configs for Qwen2.5-1.5B-Instruct and Llama-3.2-1B-Instruct by @nikita-savelyevv in #1231
- fix flux and sana export with diffusers 0.33+ by @eaidova in #1236
- Fix more ci by @IlyasMoutawwakil in #1228
- move sentencepiece to test requirements for unblocking python3.13 by @eaidova in #1240
- Update pkv precision at save_pretrained call by @nikita-savelyevv in #1235
- Install diffusers requirement during OV Full and Slow tests by @nikita-savelyevv in #1243
- Add nncf version as openvino model runtime flag by @nikita-savelyevv in #1244
- add support sana-sprint by @eaidova in #1245
- Replace compvis safety checker model with a tiny version by @nikita-savelyevv in #1250
- restore cache_position input in whisper by @eaidova in #1254
- Introduce OVCalibrationDatasetBuilder (part 1/2) by @nikita-savelyevv in #1232
- Update transformers by @IlyasMoutawwakil in #1242
- fix typo in unpatching decoder models by @eaidova in #1259
- Enable GLM4 for openvino by @openvino-dev-samples in #1249
- Fix INC CI tests failed by @changwangss in #1262
- [OV] Update
--sym
argument description by @nikita-savelyevv in #1263 - PTQ support for OVModelForFeatureExtraction and OVSentenceTransformer by @nikita-savelyevv in #1257
- Fix ipex CI by @jiqing-feng in #1260
- add OVSamModel for inference by @eaidova in #1229
- Fix documentation by @echarlaix in #1266
- PTQ support for OVModelForMaskedLM by @nikita-savelyevv in #1268
- upgrade transformers for ipex by @kaixuanliu in #1267
- Clean ipex tests by @jiqing-feng in #1270
- support LTX-video by @eaidova in #1264
- fix gptj export for transformers>4.49 by @eaidova in #1271
- Support SpeechT5 text-to-speech pipeline by OpenVINO by @rkazants in #1230
- Enable Qwen3 and Qwen3-MOE for OpenVINO by @openvino-dev-samples in #1214
- Add Mistral modeling optimization support for ipex by @kaixuanliu in #1269
- Avoid use of deprecated openvino.runtime by @rkazants in #1274
- upgrade ipex to 2.7 by @jiqing-feng in #1277
- support Zero-shot-Image-Classification by @eaidova in #1273
- fix bug when bs > 1 and do not provide
position_ids
for Mistral model by @kaixuanliu in #1276 - Fix minimum diffusers version for LTXPipeline by @echarlaix in #1279
- removal onnx fallback by @eaidova in #1272
- Replace deprecated openvino.runtime by @echarlaix in #1280
- int4 configs for Qwen3-1.7B and Qwen3-4B by @MaximProshin in #1282
- int4 config for starcoder2-15b by @MaximProshin in #1285
- Fix spaces in chat_template simplification for Mistral-7B-Instruct-v0.3 by @Wovchena in #1281
- Enable compile for ipex patched model with paged attention by @jiqing-feng in #1253
- Fix typos by @omahs in #1284
- int4 config for Qwen/Qwen3-8B by @MaximProshin in #1287
- add support phi4 multimodal by @eaidova in #1201
- Fix custom dataset processing for text encoding tasks by @nikita-savelyevv in #1286
- llama4 by @eaidova in #1226
- Default config for microsoft/Phi-4-multimodal-instruct by @nikita-savelyevv in #1289
- Remove OVBaseModelForSeq2SeqLM by @echarlaix in #1278
- Add compression tests for phi4mm by @nikita-savelyevv in #1292
- Update tiny model for ipex lanchain tests by @echarlaix in #1296
- PTQ support for OVModelForZeroShotImageClassification by @nikita-savelyevv in #1283
- add token classification for qwen2 by @eaidova in #1299
- fix beam search test for latest optimum by @eaidova in #1290
- fix automatic task detection for phi4-multimodal during data-aware quant by @eaidova in #1293
- provide workaround for export openai whisper models by @eaidova in #1301
- Introduce named test reference values per submodel by @nikita-savelyevv in #1300
- set optimum version in setup by @echarlaix in #1298
- add gemma to skip check trace models by @echarlaix in #1306
- remove forcing input_ids by @eaidova in #1307
- Fixes openvino decoder models output by @echarlaix in #1308
- Make a WA to avoid XPU crash for API
PagedAttention.reshape_and_cache_flash
by @kaixuanliu in #1288