openvinotoolkit/openvino 2024.0.0 on GitHub

Summary of major features and improvements  

More Generative AI coverage and framework integrations to minimize code changes.
- Improved out-of-the-box experience for TensorFlow* sentence encoding models through the installation of OpenVINO™ toolkit Tokenizers.
- OpenVINO™ toolkit now supports Mixture of Experts (MoE), a new architecture that helps process more efficient generative models through the pipeline.
- JavaScript developers now have seamless access to OpenVINO API. This new binding enables a smooth integration with JavaScript API.
- New and noteworthy models validated: Mistral, StableLM-tuned-alpha-3b, and StableLM-Epoch-3B.
Broader Large Language Model (LLM) support and more model compression techniques.
- Improved quality on INT4 weight compression for LLMs by adding the popular technique, Activation-aware Weight Quantization, to the Neural Network Compression Framework (NNCF). This addition reduces memory requirements and helps speed up token generation.
- Experience enhanced LLM performance on Intel® CPUs, with internal memory state enhancement, and INT8 precision for KV-cache. Specifically tailored for multi-query LLMs like ChatGLM.
- Easier optimization and conversion of Hugging Face models – compress LLM models to INT8 and INT4 with Hugging Face Optimum command line interface and export models to OpenVINO format. Note this is part of Optimum-Intel which needs to be installed separately.
- The OpenVINO™ 2024.0 release makes it easier for developers, by integrating more OpenVINO™ features with the Hugging Face* ecosystem. Store quantization configurations for popular models directly in Hugging Face to compress models into INT4 format while preserving accuracy and performance.
More portability and performance to run AI at the edge, in the cloud, or locally.
- A preview plugin architecture of the integrated Neural Processor Unit (NPU) as part of Intel® Core™ Ultra processor is now included in the main OpenVINO™ package on PyPI.
- Improved performance on ARM* by enabling the ARM threading library. In addition, we now support multi-core ARM platforms and enabled FP16 precision by default on MacOS*.
- Improved performance on ARM platforms using throughput hint, which increases efficiency in utilization of CPU cores and memory bandwidth.
- New and improved LLM serving samples from OpenVINO™ Model Server for multi-batch inputs and Retrieval Augmented Generation (RAG).

Support Change and Deprecation Notices

Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using Discontinued features, you will have to revert to the last LTS OpenVINO version supporting them.
For more details, refer to the OpenVINO Legacy Features and Components page.
Discontinued in 2024.0:
- Runtime components:
  - Intel® Gaussian & Neural Accelerator (Intel® GNA). Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
  - OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
  - All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
  - 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
  - Deployment Manager. See installation and deployment guides for current distribution options.
  - Accuracy Checker.
  - Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
  - a git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
  - Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
Deprecated and to be removed in the future:
- The OpenVINO™ Development Tools package (pip install openvino-dev) will be removed from installation options and distribution channels beginning with OpenVINO 2025.0.
- Model Optimizer will be discontinued with OpenVINO 2025.0. Consider using OpenVINO Model Converter (API call: OVC) instead. Follow the model conversion transition guide for more details.
- OpenVINO property Affinity API will be discontinued with OpenVINO 2025.0. It will be replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
- OpenVINO Model Server components:
  - Reshaping a model in runtime based on the incoming requests (auto shape and auto batch size) is deprecated and will be removed in the future. Using OpenVINO’s dynamic shape models is recommended instead.

You can find OpenVINO™ toolkit 2024.0 release here:

Download archives* with OpenVINO™
Install it via Conda: conda install -c conda-forge openvino=2024.0.0
OpenVINO™ for Python: pip install openvino==2024.0.0

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@rghvsh
@YaritaiKoto
@Abdulrahman-Adel
@jvr0123
@sami0i
@guy-tamir
@rupeshs
@karanjakhar
@abhinav231-valisetti
@rajatkrishna
@lukazlim
@siddhant-0707
@tiger100256-hu

Release documentation is available here: https://docs.openvino.ai/2024
Release Notes are available here: https://www.intel.com/content/www/us/en/developer/articles/release-notes/openvino/2024-0.html

openvinotoolkit/openvino 2024.0.0 on GitHub

Summary of major features and improvements

More Generative AI coverage and framework integrations to minimize code changes.

Broader Large Language Model (LLM) support and more model compression techniques.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support Change and Deprecation Notices

openvinotoolkit/openvino 2024.0.0
on GitHub

Summary of major features and improvements