github openvinotoolkit/openvino 2026.1.0

18 hours ago

Summary of major features and improvements  

  • More GenAI coverage and framework integrations to minimize code changes

    • New models supported on CPUs & GPUs: Qwen3 VL
    • New models supported on CPUs: GPT-OSS 120B
    • Preview: Introducing the OpenVINO backend for llama.cpp, which enables optimized inference on Intel CPUs, GPUs, and NPUs. Validated on GGUF models such as Llama-3.2-1B-Instruct-GGUF, Phi-3-mini-4k-instruct-gguf, Qwen2.5-1.5B-Instruct-GGUF, and Mistral-7B-Instruct-v0.3.
    • New notebook: Unified VLM chatbot with video file support and interactive model switching across Qwen3-VL, Qwen2.5-VL, and LLaVa-NeXT-Video.
  • Broader LLM model support and more model compression techniques

    • OpenVINO™ GenAI adds TaylorSeer Lite caching for image and video generation, accelerating diffusion-transformer inference across Flux, SD3, and LTX-Video pipelines, aligned with Hugging Face Diffusers.
    • LTX-Video generation on GPU achieves end-to-end acceleration through fusion of RMSNorm and RoPE operators, significantly improving video generation performance.
    • OpenVINO™ GenAI adds dynamic LoRA support for Qwen3-VL and VL models with LLM, allowing developers to swap adapters at runtime for efficient serving of multiple model variants in production without reloading the base model.
    • Preview: The release-weights API for ov::Model enables memory reclamation during model compilation on NPUs, delivering dramatically lower peak memory consumption for edge and client deployments. Users must set this property in ov::Model, and it will be applied during compilation.
  • More portability and performance to run AI at the edge, in the cloud, or locally.

    • Introducing support for Intel® Core™ Series 3 processors (formerly codenamed Wildcat Lake) and Intel® Arc™ Pro B70 Graphics with 32GB memory for single-GPU inference on 20-30B parameter LLMs
    • Prompt Lookup Decoding extended to vision-language pipelines, delivering significantly faster token generation for multimodal workloads on Intel CPUs and GPUs.
    • OpenVINO™ GenAI now has a smaller runtime footprint after eliminating ICU DLL dependencies from tokenization, leading to reduced memory usage, faster startup, and easier deployment.
    • OpenVINO GenAI introduces WhisperPipeline for Node.js via its NPM package, delivering production-ready speech recognition with word-level audio-to-text transcription.
    • OpenVINO™ Model Server enhances support for Qwen3-MOE and GPT-OSS-20b models, delivering improved performance, accuracy, and robust concurrent request handling with continuous batching. These pre-optimized models are available on Hugging Face for easy deployment. Additionally, the Model Server introduces image inpainting and outpainting capabilities via the /image endpoint for AI image editing.

Support Change and Deprecation Notices

  • Discontinued in 2026.0:

    • The deprecated openvino.runtime namespace has been removed. Please use the openvino namespace directly.
    • The deprecated openvino.Type.undefined has been removed. Please use openvino.Type.dynamic instead.
    • The PostponedConstant constructor signature has been updated for improved usability:
      • Old (removed): Callable[[Tensor], None]
      • New: Callable[[], Tensor]
    • The deprecated OpenVINO™ GenAI predefined generation configs were removed.
    • The deprecated OpenVINO GenAI support for whisper stateless decoder model has been removed. Please use a stateful model.
    • The deprecated OpenVINO GenAI StreamerBase put method, bool return type for callbacks, and ChunkStreamer class has been removed.
    • NNCF create_compressed_model() method is now deprecated and removed in 2026. Please use nncf.prune() method for unstructured pruning and nncf.quantize() for INT8 quantization.
    • NNCF optimization methods for TensorFlow models and TensorFlow backend in NNCF are deprecated and removed in 2026. It is recommended to use PyTorch analogous models for training-aware optimization methods and OpenVINO™ IR, PyTorch, and ONNX models for post-training optimization methods from NNCF.
    • The following experimental NNCF methods are deprecated and removed: NAS, Structural Pruning, AutoML, Knowledge Distillation, Mixed-Precision Quantization, Movement Sparsity.
    • CPU plugin now requires support for the AVX2 instruction set as a minimum system requirement. The SSE instruction set will no longer be supported.
    • OpenVINO™ migrated builds based on RHEL 8 to RHEL 9.
    • manylinux2014 upgraded to manylinux_2_28. This aligns with modern toolchain requirements but also means that CentOS 7 will no longer be supported due to glibc incompatibility.
    • MacOS x86 is no longer supported.
    • APT & YUM Repositories Restructure: Starting with release 2025.1, users can switch to the new repository structure for APT and YUM, which no longer uses year-based subdirectories (like “2025”). The old (legacy) structure is unavailable starting 2026.0. Detailed instructions are available on the relevant documentation pages:
    • OpenCV binaries removed from Docker images.
  • Deprecated and to be removed in the future:

    • Support for Ubuntu 20.04 has been discontinued due to the end of its standard support.
    • auto shape and auto batch size (reshaping a model in runtime) will be removed in the future. OpenVINO’s dynamic shape models are recommended instead.
    • With the release of Node.js v22, updated Node.js bindings are now available and compatible with the latest LTS version. These bindings do not support CentOS 7, as they rely on newer system libraries unavailable on legacy systems.
    • Starting with 2026.0 release major internal refactoring of the graph iteration mechanism has been implemented for improved performance and maintainability. The legacy path can be enabled by setting the ONNX_ITERATOR=0 environment variable. This legacy path is deprecated and will be removed in future releases.
    • OpenVINO Model Server:
      • The dedicated OpenVINO operator for Kubernetes and OpenShift is now deprecated in favor of the recommended KServe operator. The OpenVINO operator will remain functional in upcoming OpenVINO Model Server releases but will no longer be actively developed. Since KServe provides broader capabilities, no loss of functionality is expected. On the contrary, more functionalities will be accessible and migration between other serving solutions and OpenVINO Model Server will be much easier.
      • TensorFlow Serving (TFS) API support is planned for deprecation. With increasing adoption of the KServe API for classic models and the OpenAI API for generative workloads, usage of the TFS API has significantly declined. Dropping date is to be determined based on the feedback, with a tentative target of mid-2026.
      • Support for Stateful models will be deprecated. These capabilities were originally introduced for Kaldi audio models which is no longer relevant. Current audio models support relies on the OpenAI API, and pipelines implemented via OpenVINO GenAI library.
      • Directed Acyclic Graph Scheduler will be deprecated in favor of pipelines managed by MediaPipe scheduler and will be removed in 2026.3. That approach gives more flexibility, includes wider range of calculators and has support for using processing accelerators.
      • OpenVINO™ GenAI:
        • start_chat() / finish_chat() APIs are deprecated and will be removed in a future major release. Pass a ChatHistory object directly to generate() instead.

You can find OpenVINO™ toolkit 2026.1 release here:

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@andersendsa
@arunthakur009
@ItzCobaltboy
@KarSri7694
@Rahuldrabit
@samthakur587
@satheeshbhukya
@Shi-pra-19
@Sujanian1304

Release documentation is available here: https://docs.openvino.ai/2026
Release Notes are available here: https://docs.openvino.ai/2026/about-openvino/release-notes-openvino.html

Don't miss a new openvino release

NewReleases is sending notifications on new releases.