openvinotoolkit/openvino 2024.5.0 on GitHub

Summary of major features and improvements  

More Gen AI coverage and framework integrations to minimize code changes
- New models supported: Llama 3.2 (1B & 3B), Gemma 2 (2B & 9B), and YOLO11.
- LLM support on NPU: Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3
- Noteworthy notebooks added: Sam2, Llama3.2, Llama3.2 - Vision, Wav2Lip, Whisper, and Llava.
- Preview: support for Flax, a high-performance Python neural network library based on JAX. Its modular design allows for easy customization and accelerated inference on GPUs.
Broader Large Language Model (LLM) support and more model compression techniques.
- Optimizations for built-in GPUs on Intel® Core™ Ultra Processors (Series 1) and Intel® Arc™ Graphics include KV Cache compression for memory reduction along with improved usability, and model load time optimizations to improve first token latency for LLMs..
- Dynamic quantization was enabled to improve first token latency for LLMs on built-in Intel® GPUs without impacting accuracy on Intel® Core™ Ultra Processors (Series 1). Second token latency will also improve for large batch inference.
- A new method to generate synthetic text data is implemented in the Neural Network Compression Framework (NNCF). This will allow LLMs to be compressed more accurately using data-aware methods without datasets. Coming soon: This feature will soon be accessible via Optimum Intel on Hugging Face.
More portability and performance to run AI at the edge, in the cloud, or locally.
- Support for Intel® Xeon® 6 Processors with P-cores (formerly codenamed Granite Rapids) and Intel® Core™ Ultra 200V series processors (formerly codenamed Arrow Lake-S).
- Preview: GenAI API enables multimodal AI deployment with support for multimodal pipelines for improved contextual awareness, transcription pipelines for easy audio-to-text conversions, and image generation pipelines for streamlined text-to-visual conversions..
- Speculative decoding feature added to the GenAI API for improved performance and efficient text generation using a small draft model that is periodically corrected by the full-size model.
- Preview: LoRA adapters are now supported in the GenAI API for developers to quickly and efficiently customize image and text generation models for specialized tasks.
- The GenAI API now also supports LLMs on NPU allowing developers to specify NPU as the target device, specifically for WhisperPipeline (for whisper-base, whisper-medium, and whisper-small) and LLMPipeline (for Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3 Mini-instruct). Use driver version 32.0.100.3104 or later for best performance.

Support Change and Deprecation Notices

Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.
Discontinued in 2024.0:
- Runtime components:
  - Intel® Gaussian & Neural Accelerator (Intel® GNA)..Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
  - OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
  - All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
  - 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
  - Deployment Manager. See installation and deployment guides for current distribution options.
  - Accuracy Checker.
  - Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
  - A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
  - Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
Deprecated and to be removed in the future:
- The macOS x86_64 debug bins will no longer be provided with the OpenVINO toolkit, starting with OpenVINO 2024.5.
- Python 3.8 is no longer supported, starting with OpenVINO 2024.5.
  - As MxNet doesn’t support Python version higher than 3.8, according to the MxNet PyPI project, it is no longer supported by OpenVINO, either.
- Discrete Keem Bay support is no longer supported, starting with OpenVINO 2024.5.
- Support for discrete devices (formerly codenamed Raptor Lake) is no longer available for NPU.

You can find OpenVINO™ toolkit 2024.5 release here:

Download archives* with OpenVINO™
OpenVINO™ for Python: pip install openvino==2024.5.0

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@aku221b
@halm-zenger
@hibahassan1
@hub-bla
@jagadeeshmadinni
@nashez
@tianyiSKY1
@tiebreaker4869

Release documentation is available here: https://docs.openvino.ai/2024
Release Notes are available here: https://docs.openvino.ai/2024/about-openvino/release-notes-openvino.html

openvinotoolkit/openvino 2024.5.0 on GitHub

Summary of major features and improvements

More Gen AI coverage and framework integrations to minimize code changes

Broader Large Language Model (LLM) support and more model compression techniques.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support Change and Deprecation Notices

openvinotoolkit/openvino 2024.5.0
on GitHub

Summary of major features and improvements