openvinotoolkit/openvino 2025.0.0 on GitHub

Summary of major features and improvements  

More GenAI coverage and framework integrations to minimize code changes
- New models supported: Qwen 2.5, Deepseek-R1-Distill-Llama-8B, DeepSeek-R1-Distill-Qwen-7B, and DeepSeek-R1-Distill-Qwen-1.5B, FLUX.1 Schnell and FLUX.1 Dev
- Whisper Model: Improved performance on CPUs, built-in GPUs, and discrete GPUs with GenAI API.
- Preview: Introducing NPU support for torch.compile, giving developers the ability to use the OpenVINO backend to run the PyTorch API on NPUs. 300+ deep learning models enabled from the TorchVision, Timm, and TorchBench repositories..
Broader Large Language Model (LLM) support and more model compression techniques.
- Preview: Addition of Prompt Lookup to GenAI API improves 2nd token latency for LLMs by effectively utilizing predefined prompts that match the intended use case.
- Preview: The GenAI API now offers image-to-image inpainting functionality. This feature enables models to generate realistic content by inpainting specified modifications and seamlessly integrating them with the original image.
- Asymmetric KV Cache compression is now enabled for INT8 on CPUs, resulting in lower memory consumption and improved 2nd token latency, especially when dealing with long prompts that require significant memory. The option should be explicitly specified by the user.
More portability and performance to run AI at the edge, in the cloud, or locally.
- Support for the latest Intel® Core™ Ultra 200H series processors (formerly codenamed Arrow Lake-H)
- Integration of the OpenVINO ™ backend with the Triton Inference Server allows developers to utilize the Triton server for enhanced model serving performance when deploying on Intel CPUs.
- Preview: A new OpenVINO ™ backend integration allows developers to leverage OpenVINO performance optimizations directly within Keras 3 workflows for faster AI inference on CPUs, built-in GPUs, discrete GPUs, and NPUs. This feature is available with the latest Keras 3.8 release.
- The OpenVINO Model Server now supports native Windows Server deployments, allowing developers to leverage better performance by eliminating container overhead and simplifying GPU deployment.

Support Change and Deprecation Notices

Now deprecated:
- Legacy prefixes l_, w_, and m_ have been removed from OpenVINO archive names.
- The runtime namespace for Python API has been marked as deprecated and designated to be removed for 2026.0. The new namespace structure has been delivered, and migration is possible immediately. Details will be communicated through warnings and via documentation.
- NNCF create_compressed_model() method is deprecated. nncf.quantize() method is now recommended for Quantization-Aware Training of PyTorch and TensorFlow models.

You can find OpenVINO™ toolkit 2025.0 release here:

Download archives* with OpenVINO™
Install it via Conda: conda install -c conda-forge openvino=2025.0.0
OpenVINO™ for Python: pip install openvino==2025.0.0

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@0xfedcafe
@11happy
@cocoshe
@emir05051
@geeky33
@h6197627
@hub-bla
@Manideep-Kanna
@nashez
@nashez
@shivam5522
@sumhaj
@vatsalashanubhag
@xyz-harshal

Release documentation is available here: https://docs.openvino.ai/2025
Release Notes are available here: https://docs.openvino.ai/2025/about-openvino/release-notes-openvino.html

openvinotoolkit/openvino 2025.0.0 on GitHub

Summary of major features and improvements

More GenAI coverage and framework integrations to minimize code changes

Broader Large Language Model (LLM) support and more model compression techniques.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support Change and Deprecation Notices

openvinotoolkit/openvino 2025.0.0
on GitHub

Summary of major features and improvements