Performance optimizations
- Intel Processor Graphics and Xe architecture-based Graphics:
- Improved performance of Winograd convolution.
- Intel Architecture processors
- Introduced initial performance optimizations for future Intel Core processor with Intel AVX2 and Intel DL Boost instructions support (code name Alder Lake).
- Improved performance of int8 primitive for processors with Intel SSE4.1 instruction set support.
- Improved performance of int8 and bfloat16 RNN and Inner Product primitives.
- AArch64-based processors
- Improved performance of Winograd convolution with ArmCL
- Improved performance of int8 convolution with ArmCL
- Added JIT support for Aarch64 and JIT reorder implementation
New Functionality
- Introduced int8 support for LSTM primitive with projection for CPU.
Thanks to the contributors
This release contains contributions from the project core team as well as Alejandro Alvarez, Aleksandr Nikolaev @alenik01, Arthur Mitrano @aaraujom, Benjamin Fitch, Diana Bite @diaena, Kentaro Kawakami @kawakami-k, Nathan John Sircombe @nSircombe, Peter Caday @petercad, Rafik Saliev @rfsaliev, yuri@FreeBSD @yurivict. We would also like to thank everyone who asked questions and reported issues.