Performance optimizations

Intel Processor Graphics and Xe architecture-based Graphics:
- Improved performance of convolutions and matmul primitives.
- Improved performance of int8 convolutions for NHWC activations format.
Intel Architecture processors:
- Improved performance of primitives for NHWC activations format.
- Improved fp32 GEMM performance for small N
- Improved performance of int8 primitives for processors with Intel SSE4.1 instruction set support.
AArch64-based processors
- Added support for Arm Performance Library (ArmPL). ArmPL provides optimized GEMM implementation for aarch64.
- Added support for Arm Compute Library (ArmCL). ArmCL provides optimized convolution implementation for aarch64.

New Functionality

Added support for IBMz (s390x) and IBM POWER (powerpc64) architectures
Introduced RNN GRU for GPU.
Introduced int8 RNN GRU for CPU
Introduced asymmetric quantization support for convolutions and matmul
Introduced dilated pooling support.
Extended matmul primitive to support multiple dimensions in batch and broadcast on CPU.
(preview) Introduced binary post-op for (de)-convolution, pooling, eltwise, binary, inner product, and matmul.
(preview) Extended the number of supported post-ops for primitives to 20.
(preview) Introduced reduction primitive for CPU. Together with post-ops this functionality allows to implement normalization.

Thanks to the contributors

This release contains contributions from the project core team as well as Ben Fitch, Brian Shi, David Edelsohn @edelsohn, Diana Bite @diaena, Moaz Reyad @moazreyad, Nathan John Sircombe @nSircombe, Niels Dekker @N-Dekker, Peter Caday @petercad, Pinzhen Xu @pinzhenx, pkubaj @pkubaj, Tsao Zhong @CaoZhongZ. We would also like to thank everyone who asked questions and reported issues.

oneapi-src/oneDNN v1.7 on GitHub

Performance optimizations

New Functionality

Thanks to the contributors

oneapi-src/oneDNN v1.7
on GitHub