openvinotoolkit/openvino 2025.4.0 on GitHub

Summary of major features and improvements  

More GenAI coverage and framework integrations to minimize code changes
- New models supported:
  - On CPUs & GPUs: Qwen3-Embedding-0.6B, Qwen3-Reranker-0.6B, Mistral-Small-24B-Instruct-2501.
  - On NPUs: Gemma-3-4b-it and Qwen2.5-VL-3B-Instruct.
- Preview: Mixture of Experts (MoE) models optimized for CPUs and GPUs, validated for Qwen3-30B-A3B.
- GenAI pipeline integrations: Qwen3-Embedding-0.6B and Qwen3-Reranker-0.6B for enhanced retrieval/ranking, and Qwen2.5VL-7B for video pipeline.
Broader LLM model support and more model compression techniques
- Gold support for Windows ML* enables developers to deploy AI models and applications effortlessly across CPUs, GPUs, and NPUs on Intel® Core™ Ultra processor-powered AI PCs.
- The Neural Network Compression Framework (NNCF) ONNX backend now supports INT8 static post-training quantization (PTQ) and INT8/INT4 weight-only compression to ensure accuracy parity with OpenVINO IR format models. SmoothQuant algorithm support added for INT8 quantization.
- Accelerated multi-token generation for GenAI, leveraging optimized GPU kernels to deliver faster inference, smarter KV-cache reuse, and scalable LLM performance.
- GPU plugin updates include improved performance with prefix caching for chat history scenarios and enhanced LLM accuracy with dynamic quantization support for INT8.
More portability and performance to run AI at the edge, in the cloud, or locally.
- Announcing support for Intel® Core™ Ultra Processor Series 3.
- Encrypted blob format support added for secure model deployment with OpenVINO™ GenAI. Model weights and artifacts are stored and transmitted in an encrypted format, reducing risks of IP theft during deployment. Developers can deploy with minimal code changes using OpenVINO GenAI pipelines.
- OpenVINO™ Model Server and OpenVINO™ GenAI now extend support for Agentic AI scenarios with new features such as output parsing and improved chat templates for reliable multi-turn interactions, and preview functionality for the Qwen3-30B-A3B model. OVMS also introduces a preview for audio endpoints.
- NPU deployment is simplified with batch support, enabling seamless model execution across Intel® Core™ Ultra processors while eliminating driver dependencies. Models are reshaped to batch_size=1 before compilation.
- The improved NVIDIA Triton Server* integration with OpenVINO backend now enables developers to utilize Intel GPUs or NPUs for deployment.

Support Change and Deprecation Notices

Discontinued in 2025:
- Runtime components:
  - The OpenVINO property of Affinity API is no longer available. It has been replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
  - The runtime namespace for Python API has been marked as deprecated and designated to be removed for 2026.0. The new namespace structure has been delivered, and migration is possible immediately. Details will be communicated through warnings and via documentation.
  - Binary operations Node API has been removed from Python API after previous deprecation.
  - PostponedConstant Python API Update: The PostponedConstant constructor signature is changing for better usability. Update maker from Callable[[Tensor], None] to Callable[[], Tensor]. The old signature will be removed in version 2026.0.
- Tools:
  - The OpenVINO™ Development Tools package (pip install openvino-dev) is no longer available for OpenVINO releases in 2025.
  - Model Optimizer is no longer available. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
  - Intel® Streaming SIMD Extensions (Intel® SSE) are currently not enabled in the binary package by default. They are still supported in the source code form.
  - Legacy prefixes: l_, w_, and m_ have been removed from OpenVINO archive names.
- OpenVINO GenAI:
  - StreamerBase::put(int64_t token)
  - The Bool value for Callback streamer is no longer accepted. It must now return one of three values of StreamingStatus enum.
  - ChunkStreamerBase is deprecated. Use StreamerBase instead.
- Deprecated OpenVINO Model Server (OVMS) benchmark client in C++ using TensorFlow Serving API.
- NPU Device Plugin:
  - Removed logic to detect and handle Intel® Core™ Ultra Processors (Series 1) drivers older than v1688. Since v1688 is the earliest officially supported driver, older versions (e.g., v1477) are no longer recommended or supported.
- Python 3.9 support will be discontinued starting with the OpenVINO 2025.4 and Neural Network Compression Framework (NNCF) 2.19.0.
Deprecated and to be removed in the future:
- openvino.Type.undefined is now deprecated and will be removed with version 2026.0. openvino.Type.dynamic should be used instead.
- The openvino-nightly PyPI module has been discontinued. End-users should proceed with the Simple PyPI nightly repo instead. More information in Release Policy.The openvino-nightly PyPI module has been discontinued. End-users should proceed with the Simple PyPI nightly repo instead. More information in Release Policy.
- auto shape and auto batch size (reshaping a model in runtime) will be removed in the future. OpenVINO’s dynamic shape models are recommended instead.
- MacOS x86 is no longer recommended for use due to the discontinuation of validation. Full support will be removed later in 2025.
- The openvino namespace of the OpenVINO Python API has been redesigned, removing the nested openvino.runtime module. The old namespace is now considered deprecated and will be discontinued in 2026.0. A new namespace structure is available for immediate migration. Details will be provided through warnings and documentation.
- Starting with OpenVINO release 2026.0, the CPU plugin will require support for the AVX2 instruction set as a minimum system requirement. The SSE instruction set will no longer be supported.
- APT & YUM Repositories Restructure: Starting with release 2025.1, users can switch to the new repository structure for APT and YUM, which no longer uses year-based subdirectories (like “2025”). The old (legacy) structure will still be available until 2026, when the change will be finalized. Detailed instructions are available on the relevant documentation pages:
  - Installation guide - yum
  - Installation guide - apt
- OpenCV binaries will be removed from Docker images in 2026.
- Starting with the 2026.0 release, OpenVINO will migrate builds based on RHEL 8 to RHEL 9.
- NNCF create_compressed_model() method is now deprecated and will be removed in 2026. nncf.quantize() method is recommended for Quantization-Aware Training of PyTorch models.
- NNCF optimization methods for TensorFlow models and TensorFlow backend in NNCF are deprecated and will be removed in 2026. It is recommended to use PyTorch analogous models for training-aware optimization methods and OpenVINO IR, PyTorch, and ONNX models for post-training optimization methods from NNCF.
- The following experimental NNCF methods are deprecated and will be removed in 2026: NAS, Structural Pruning, AutoML, Knowledge Distillation, Mixed-Precision Quantization, Movement Sparsity.
- Starting with the 2026.0 release, manylinux2014 will be upgraded to manylinux_2_28. This aligns with modern toolchain requirements but also means that CentOS 7 will no longer be supported due to glibc incompatibility.
- With the release of Node.js v22, updated Node.js bindings are now available and compatible with the latest LTS version. These bindings do not support CentOS 7, as they rely on newer system libraries unavailable on legacy systems.
- OpenVINO Model Server:
  - The dedicated OpenVINO operator for Kubernetes and OpenShift is now deprecated in favor of the recommended KServe operator. The OpenVINO operator will remain functional in upcoming OpenVINO Model Server releases but will no longer be actively developed. Since KServe provides broader capabilities, no loss of functionality is expected. On the contrary, more functionalities will be accessible and migration between other serving solutions and OpenVINO Model Server will be much easier.
  - TensorFlow Serving (TFS) API support is planned for deprecation. With increasing adoption of the KServe API for classic models and the OpenAI API for generative workloads, usage of the TFS API has significantly declined. Dropping date is to be determined based on the feedback, with a tentative target of mid-2026.
  - Support for Stateful models will be deprecated. These capabilities were originally introduced for Kaldi audio models which is no longer relevant. Current audio models support relies on the OpenAI API, and pipelines implemented via OpenVINO GenAI library.
  - Directed Acyclic Graph Scheduler will be deprecated in favor of pipelines managed by MediaPipe scheduler and will be removed in 2026.3. That approach gives more flexibility, includes wider range of calculators and has support for using processing accelerators.

You can find OpenVINO™ toolkit 2025.4 release here:

Download archives* with OpenVINO™
Install it via Conda: conda install -c conda-forge openvino=2025.4.0
OpenVINO™ for Python: pip install openvino==2025.4.0

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@arunthakur009
@mahdi-jfri
@nashez
@RudraCodesForU
@Sujanian1304
@Vladislav-Denisov

Release documentation is available here: https://docs.openvino.ai/2025
Release Notes are available here: https://docs.openvino.ai/2025/about-openvino/release-notes-openvino.html

openvinotoolkit/openvino 2025.4.0 on GitHub

Summary of major features and improvements

More GenAI coverage and framework integrations to minimize code changes

Broader LLM model support and more model compression techniques

More portability and performance to run AI at the edge, in the cloud, or locally.

Support Change and Deprecation Notices

Runtime components:

Tools:

OpenVINO GenAI:

NPU Device Plugin:

openvinotoolkit/openvino 2025.4.0
on GitHub

Summary of major features and improvements