Performance optimizations

Intel Architecture processors

Improved performance of convolutional neural networks (CNN) related functionality with NHWC activations on all supported processors
Improved binary primitive performance for the broadcast case
Improved performance of eltwise primitive backpropagation and corresponding post-ops
Improved performance of pooling, resampling, LRN primitives
Improved performance of bfloat16 and fp32 weights gradient convolutions with groups
Improved performance of int8 convolutions with 1x1 kernel and spatial strides

Intel Processor Graphics and Xe architecture-based Graphics

Introduced initial optimizations for Xe architecture-based Graphics (code named DG1 and Tiger Lake).
Improved performance of convolutional neural networks (CNN) related functionality with NHWC activations.

Usability

Introduced support for Arm* 64-bit Architecture (AArch64) and other non-x86 processors.
Separated primitive cache state from engine making it persistent.
Introduced API for managing primitive cache state.

Validation

Introduced validation mode to detect out of bounds access.

Thanks to the contributors

This release contains contributions from the project core team as well as Anuj Mittal @anujm1, Arthur Mitrano @aaraujom, Benjamin Fitch, Ilia Taraban @itaraban, Leona C. @indie, Nathan John Sircombe @nSircombe, Sergey Nesterov @cepera, Tsao Zhong @CaoZhongZ, yuri@FreeBSD @yurivict. We would also like to thank everyone who asked questions and reported issues.

oneapi-src/oneDNN v1.5 on GitHub