Performance Optimizations

Improved primitive cache performance for Intel Graphics products.
Intel Architecture Processors
- Improved performance for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Improved binary primitive performance for cases when one of the tensors is broadcasted.
- Improved performance of reduction primitive, reorder, shuffle primitives.
- Improved performance of depthwise convolution forward propagation for processors with Intel AVX5-12 support
- Improved performance of forward inner product primitive for the shapes with minibatch equal to 1 for processors with Intel AVX-512 support
- Improved performance of int8 matmul and inner product primitives for processors with Intel AVX2 and Intel DL Boost support
Intel Graphics Products
- Introduced initial optimizations for future Intel Arc graphics (code name Alchemist and DG2).
- Improved performance of convolution and deconvolution primitives with new JIT convolution kernel generator implementation. These optimizations are identified by jit:ir marker in oneDNN verbose log.
AArch64-based Processors
- Added support for bfloat16 acceleration with Arm Compute Library (ACL). The behavior is controlled by floating point math mode API.
- Improved inner product, matmul, and eltwise primitives performance with ACL.
- Introduced support for sum and for indirect and Winograd convolution implementations with ACL.
NVIDIA Graphics
- Improved convolution performance with eltwise post-op.

Functionality

Introduced PReLU post-op support in convolution and matmul.
Extended maximum allowed post-ops chain for compute primitives (convolution, deconvolution, inner product, and matmul) to 32.
Introduced support for zero points in sum post-op for convolution and matmul. The functionality is implemented only for CPUs.
Extended binary primitive with support for mixed data types for input tensors. The functionality is implemented only for CPUs.
Extended sum post-op for convolution and matmul primitives with support for mixed data types. The functionality is implemented only for CPUs.
Added Unified Shared Memory (USM) support for OpenCL GPU runtime.

Usability

Added compile time options to manage the set of supported primitives and workload types. See DNNL_ENABLE_WORKLOAD and DNNL_ENABLE_PRIMITIVE in build options for more details. This feature allows to reduce binary footprint of the library for specialized applications.
Reduced overall library size by trimming down use of templates, OpenCL headers, and TBB headers. The configurations that benefitted the most are CPU only configuration with TBB threading and GPU only configuration. Note, that binary footprint depends on the compiler used to build the library and build options.
Introduced floating point math mode API. The API allows the library to use bfloat16 or float16 hardware acceleration in fp32 operations. Currently this mode is supported only on AArch64 processors when oneDNN is built with ACL.
Added a build option DNNL_LIBRARY_NAME to change the library name and CMake target. This feature helps projects that use multiple oneDNN configurations.

Breaking Changes

Updated minimal supported ACL version to 21.08 (was 21.05).

Deprecated functionality

Intel MKL-DNN compatibility API is deprecated and will be removed in the next update. See Transition from Intel MKL-DNN to oneDNN page for instructions on moving to new API.
Support for Intel Xeon Phi processors is deprecated and will be removed in the next release.

Thanks to the Contributors

This release contains contributions from the project core team as well as
Aleksandr Nikolaev @alenik01, Arthur Mitrano @aaraujom, Crefeda Rodrigues @cfRod, Diana Bite @diaena, Jing Xu @jingxu10, Kentaro Kawakami @kawakami-k, Kevin Putnam @intelkevinputnam, Mesut Meterelliyoz @mmeterel, MITSUNARI Shigeo @herumi, Nathan John Sircombe @nSircombe, Nicolas Chauvet @kwizart, Peter Caday @petercad. We would also like to thank everyone who asked questions and reported issues.

oneapi-src/oneDNN v2.4 on GitHub