Performance Optimizations

Intel Architecture Processors:
- Improved performance for 4th generation Intel Xeon Scalable processor (formerly Sapphire Rapids).
- Introduced FP16 support and initial optimizations for future Intel Xeon Scalable processor (code name Granite Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
Intel Graphics Products:
- Improved performance for Intel Data Center GPU Max Series (formerly Ponte Vecchio).
- Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and Intel Data Center GPU Flex Series (formerly Arctic Sound-M).
AArch64-based Processors:
- Improved reorder performance for processors with Scalable Vector Extensions (SVE) support.
- Improved pooling performance with post-ops for processors with SVE 512 support.
- Improved batch normalization performance with non-default flags for processors with SVE 512 support.
- Improved performance of FP16 functionality with Compute Library for Arm Architecture (ACL).
- Improved deconvolution performance with ACL.
PowerPC64-based Processors:
- Improved int8 GEMM performance.

Functionality

Introduced new quantization scheme. Major changes include support for per-argument runtime scales in all primitives and unquantized bias.
[experimental] Introduced Graph API support that simplifies oneDNN integration into applications. The functionality is disabled by default and can be enabled at build time with ONEDNN_BUILD_GRAPH=ON flag.
Introduced support for Intel DPC++/C++ Compiler 2023.0, including new features from the SYCL 2020 standard.
Extended persistent cache to cover GPU engine object. This improvement allows applications to further reduce oneDNN initialization time.
Extended threadpool API with a function to indicate maximum available concurrency.
Extended binary primitive implementation on GPU with bfloat16 source and int8 destination support.
Introduced pooling and reduction primitives support on AMD GPUs.
Introduced reduction primitive support on NVIDIA GPUs.

Extended the set of supported format tags to cover formats used in applications.

Extended the GoogleTest (gtest) suite with support for Parametric Rectified Linear Unit (PReLU) primitive.

Removed deprecated APIs.
Removed operation descriptor object and made memory descriptor object opaque. See details in operation and memory descriptors RFC.
Removed creation time primitive scales support and primitive output scales support. See details in quantization scaling RFC.
Removed support for Intel DPC++/C++ Compiler with SYCL 1.2.1 (aka SYCL 2017) standard.
Removed Winograd convolution implementation for int8 data type.
Updated minimal supported ACL version to 22.08 (was 22.05).

This release contains contributions from the project core team as well as @akshatasangelkar, Aryan Karumuri @AryanKarumuri, Crefeda Rodrigues @cfRod, Divakar Mariyanna @bmdivakar, Gordon Fossum @austinpagan, Jonathan Deakin @jondea, Kentaro Kawakami @kawakami-k, lilianhuang @lilh9598, Milos Puzovic @milpuz01, Mona Minakshi @monaminakshi, Nathan John Sircombe @nSircombe, Peter Caday @petercad, and Sreekanth Yalachigere @sreekanth-yalachigere. We would also like to thank everyone who asked questions and reported issues.