Performance Optimizations
- Intel Architecture Processors:
- Improved performance for 4th generation Intel Xeon Scalable processor (formerly Sapphire Rapids).
- Introduced FP16 support and initial optimizations for future Intel Xeon Scalable processor (code name Granite Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Intel Graphics Products:
- Improved performance for Intel Data Center GPU Max Series (formerly Ponte Vecchio).
- Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and Intel Data Center GPU Flex Series (formerly Arctic Sound-M).
- AArch64-based Processors:
- Improved reorder performance for processors with Scalable Vector Extensions (SVE) support.
- Improved pooling performance with post-ops for processors with SVE 512 support.
- Improved batch normalization performance with non-default flags for processors with SVE 512 support.
- Improved performance of FP16 functionality with Compute Library for Arm Architecture (ACL).
- Improved deconvolution performance with ACL.
- PowerPC64-based Processors:
- Improved int8 GEMM performance.
Functionality
- Introduced new quantization scheme. Major changes include support for per-argument runtime scales in all primitives and unquantized bias.
- [experimental] Introduced Graph API support that simplifies oneDNN integration into applications. The functionality is disabled by default and can be enabled at build time with
ONEDNN_BUILD_GRAPH=ON
flag. - Introduced support for Intel DPC++/C++ Compiler 2023.0, including new features from the SYCL 2020 standard.
- Extended persistent cache to cover GPU engine object. This improvement allows applications to further reduce oneDNN initialization time.
- Extended threadpool API with a function to indicate maximum available concurrency.
- Extended binary primitive implementation on GPU with bfloat16 source and int8 destination support.
- Introduced pooling and reduction primitives support on AMD GPUs.
- Introduced reduction primitive support on NVIDIA GPUs.
Usability
- Extended the set of supported format tags to cover formats used in applications.
Validation
- Extended the GoogleTest (gtest) suite with support for Parametric Rectified Linear Unit (PReLU) primitive.
Breaking Changes
- Removed deprecated APIs.
- Removed operation descriptor object and made memory descriptor object opaque. See details in operation and memory descriptors RFC.
- Removed creation time primitive scales support and primitive output scales support. See details in quantization scaling RFC.
- Removed support for Intel DPC++/C++ Compiler with SYCL 1.2.1 (aka SYCL 2017) standard.
- Removed Winograd convolution implementation for int8 data type.
- Updated minimal supported ACL version to 22.08 (was 22.05).
Thanks to the Contributors
This release contains contributions from the project core team as well as @akshatasangelkar, Aryan Karumuri @AryanKarumuri, Crefeda Rodrigues @cfRod, Divakar Mariyanna @bmdivakar, Gordon Fossum @austinpagan, Jonathan Deakin @jondea, Kentaro Kawakami @kawakami-k, lilianhuang @lilh9598, Milos Puzovic @milpuz01, Mona Minakshi @monaminakshi, Nathan John Sircombe @nSircombe, Peter Caday @petercad, and Sreekanth Yalachigere @sreekanth-yalachigere. We would also like to thank everyone who asked questions and reported issues.