openvinotoolkit/openvino 2025.3.0 on GitHub

Summary of major features and improvements  

More GenAI coverage and framework integrations to minimize code changes
- New models supported: Phi-4-mini-reasoning, AFM-4.5B, Gemma-3-1B-it, Gemma-3-4B-it, and Gemma-3-12B.
- NPU support added for: Qwen3-1.7B, Qwen3-4B, and Qwen3-8B.
- LLMs optimized for NPU now available on OpenVINO Hugging Face collection.
- Preview: Intel® Core™ Ultra Processor and Windows-based AI PCs can now leverage the OpenVINO™ Execution Provider for Windows* ML for high-performance, off-the-shelf starting experience on Windows*.
Broader LLM model support and more model compression techniques
- The NPU plug-in adds support for longer contexts of up to 8K tokens, dynamic prompts, and dynamic LoRA for improved LLM performance.
- The NPU plug-in now supports dynamic batch sizes by reshaping the model to a batch size of 1 and concurrently managing multiple inference requests, enhancing performance and optimizing memory utilization.
- Accuracy improvements for GenAI models on both built-in and discrete graphics achieved through the implementation of the key cache compression per channel technique, in addition to the existing KV cache per-token compression method.
- OpenVINO™ GenAI introduces TextRerankPipeline for improved retrieval relevance and RAG pipeline accuracy, plus Structured Output for enhanced response reliability and function calling while ensuring adherence to predefined formats.
More portability and performance to run AI at the edge, in the cloud, or locally.
- Announcing support for Intel® Arc™ Pro B-Series (B50 and B60).
- Preview: Hugging Face models that are GGUF-enabled for OpenVINO GenAI are now supported by the OpenVINO™ Model Server for popular LLM model architectures such as DeepSeek Distill, Qwen2, Qwen2.5, and Llama 3. This functionality reduces memory footprint and simplifies integration for GenAI workloads.
- With improved reliability and tool call accuracy, the OpenVINO™ Model Server boosts support for agentic AI use cases on AI PCs, while enhancing performance on Intel CPUs, built-in GPUs, and NPUs.
- int4 data-aware weights compression, now supported in the Neural Network Compression Framework (NNCF) for ONNX models, reduces memory footprint while maintaining accuracy and enables efficient deployment in resource-constrained environments.

Support Change and Deprecation Notices

Discontinued in 2025:
- Runtime components:
  - The OpenVINO property of Affinity API is no longer available. It has been replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
  - The openvino-nightly PyPI module has been discontinued. End-users should proceed with the Simple PyPI nightly repo instead. More information in Release Policy.The openvino-nightly PyPI module has been discontinued. End-users should proceed with the Simple PyPI nightly repo instead. More information in Release Policy.
  - Binary operations Node API has been removed from Python API after previous deprecation.
- Tools:
  - The OpenVINO™ Development Tools package (pip install openvino-dev) is no longer available for OpenVINO releases in 2025.
  - Model Optimizer is no longer available. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
  - Intel® Streaming SIMD Extensions (Intel® SSE) are currently not enabled in the binary package by default. They are still supported in the source code form.
  - Legacy prefixes: l_, w_, and m_ have been removed from OpenVINO archive names.
- OpenVINO GenAI:
  - StreamerBase::put(int64_t token)
  - The Bool value for Callback streamer is no longer accepted. It must now return one of three values of StreamingStatus enum.
  - ChunkStreamerBase is deprecated. Use StreamerBase instead.
- NNCF create_compressed_model() method is now deprecated. nncf.quantize() method is recommended for Quantization-Aware Training of PyTorch and TensorFlow models.
- Deprecated OpenVINO Model Server (OVMS) benchmark client in C++ using TensorFlow Serving API.
Deprecated and to be removed in the future:
- Python 3.9 is now deprecated and will be unavailable after OpenVINO version 2025.4.
- openvino.Type.undefined is now deprecated and will be removed with version 2026.0. openvino.Type.dynamic should be used instead.
- APT & YUM Repositories Restructure: Starting with release 2025.1, users can switch to the new repository structure for APT and YUM, which no longer uses year-based subdirectories (like “2025”). The old (legacy) structure will still be available until 2026, when the change will be finalized. Detailed instructions are available on the relevant documentation pages:
  - Installation guide - yum
  - Installation guide - apt
- OpenCV binaries will be removed from Docker images in 2026.
- The openvino namespace of the OpenVINO Python API has been redesigned, removing the nested openvino.runtime module. The old namespace is now considered deprecated and will be discontinued in 2026.0. A new namespace structure is available for immediate migration. Details will be provided through warnings and documentation.
- Starting with the next release, manylinux2014 will be upgraded to manylinux_2_28. This aligns with modern toolchain requirements but also means that CentOS 7 will no longer be supported due to glibc incompatibility.
- With the release of Node.js v22, updated Node.js bindings are now available and compatible with the latest LTS version. These bindings do not support CentOS 7, as they rely on newer system libraries unavailable on legacy systems.

You can find OpenVINO™ toolkit 2025.3 release here:

Download archives* with OpenVINO™
OpenVINO™ for Python: pip install openvino==2025.3.0

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@mahdi-jfri
@11happy
@arunthakur009
@Vladislav-Denisov
@madhurthareja
@mohiuddin-khan-shiam
@Hmm-1224
@kuanxian1
@johnrhimawan
@kinnam888

Release documentation is available here: https://docs.openvino.ai/2025
Release Notes are available here: https://docs.openvino.ai/2025/about-openvino/release-notes-openvino.html

openvinotoolkit/openvino 2025.3.0 on GitHub

Summary of major features and improvements

More GenAI coverage and framework integrations to minimize code changes

Broader LLM model support and more model compression techniques

More portability and performance to run AI at the edge, in the cloud, or locally.

Support Change and Deprecation Notices

Runtime components:

Tools:

OpenVINO GenAI:

openvinotoolkit/openvino 2025.3.0
on GitHub

Summary of major features and improvements