This is a patch release containing the following changes to v3.6.1:
- Updated Arm Compute Library (ACL) to 24.11.1
- Fixed segmentation fault issue in convolution primitive on processors with Intel AVX2 instruction set support (2eb3dd1).
- Added workaround for building issue when using GCC 8.2 and GNU Binutils 2.27 (262fb02).
- Fixed issue when ACL kernels called in parallel with different execution contexts (f30310e).
- Added a workaround for compilation issue for oneDNN Graph gtests when using GNU compiler (be16835).
- Removed specific diagnostics from the function (937c658, 3dd6aa6).
- Ignored GCC specific flags that produce false-positive issues (5488e8c).
- Expanded brgemm unsupported cases handling mechanism on AArch64-based Processors (ac9c3d0, ac9c3d0).
- Fix unimplemented error when --cpu-isa-hints=prefer_ymm on AArch64-based Processors (373bc5b).
- Enabled optimized reorder primitive between bf16 and f32 data on AArch64-based Processors(872ecac).
- Enabled support of bf16 matmul primitive with fp32 output on AArch64-based Processors (b8bdd63).
- Fixed issue in matmul primitive for 4-dimentional tensor on AArch64-based Processors (b3be239).
- Fixed out-of-bound warnings in deconvolution primitive on AArch64-based Processors (583215d).
- Fixed correctness issue in reorder primitive with zero points for 4-dimentional shapes on AArch64-based Processors (e9d0fdb).
- Enabled support of bf16 datatype in reorder primitive on AArch64-based Processors (188ae7f).
- Fixed performance regression for backward convolution primitive creation time (2b3389f).
- Improved performance of
fp16
matmul withint4
weights on Intel GPUs based on Xe2 architecture (4c8fb2c, 3dd4f43, 280bd28). - Fixed performance regression for int8 convolution with large spatial sizes on processors with Intel AMX support (05d68df).
- Restricted check for microkernel fusion support to cases when fusion functionality is actually used on Intel GPUs (48f6bd9).