This is a patch release containing the following changes to v2.6:
- Extended depthwise convolution post-op with support for arbitrary filter size, stride, and padding (79b019b)
- Improved GEMM performance with threadpool threading on system with Intel AVX2 instruction set (2be0060)
- Fixed runtime error in GPU reduction primitive for specific tensor sizes (efbf9b5)
- Improved convolution performance on GPUs with Xe-HPG IP (f8de0c9, c1fb8ac)
- Updated ITT API to 3.22.5 (9b18676)
- Fixed correctness issues in reorder implementation for non-x64 systems (9961b86, 1020631, 8b960df, ef1d9fa, 8edd859, 39edcf6, 3e0a0d9, 1dff625, 8661958)
- Fixed handling on
inf
and-inf
values in eltwise log algorithm (732cbdd, 3fd0f2e) - Improved depthwise convolution performance on GPUs with Xe-HPG IP (7a6fe1d)
- Addressed fails in
test_isa_hints
gtest on GPUs (78c1c68) - Fixed issues with bfloat16 GEMM producing NaNs in certain cases on GPUs with Xe-HPC IP (5d65970)
- Changed default layout to blocked for depthwise convolutions to avoid spurious reorders (78f231b)
- Addressed issue with incorrect values in padded areas for convolution with post-ops on GPUs (2e4ad3a)
- Fixed build issues with
-Werror=odr
option (27668dd) - Addressed issues detected by clang USAN in BRGEMM kernel (2bbaa30, 9b3826f, b59b027)