oneapi-src/oneDNN v3.6.2 on GitHub

This is a patch release containing the following changes to v3.6.1:

Updated Arm Compute Library (ACL) to 24.11.1
Fixed segmentation fault issue in convolution primitive on processors with Intel AVX2 instruction set support (2eb3dd1).
Added workaround for building issue when using GCC 8.2 and GNU Binutils 2.27 (262fb02).
Fixed issue when ACL kernels called in parallel with different execution contexts (f30310e).
Added a workaround for compilation issue for oneDNN Graph gtests when using GNU compiler (be16835).
Removed specific diagnostics from the function (937c658, 3dd6aa6).
Ignored GCC specific flags that produce false-positive issues (5488e8c).
Expanded brgemm unsupported cases handling mechanism on AArch64-based Processors (ac9c3d0, ac9c3d0).
Fix unimplemented error when --cpu-isa-hints=prefer_ymm on AArch64-based Processors (373bc5b).
Enabled optimized reorder primitive between bf16 and f32 data on AArch64-based Processors(872ecac).
Enabled support of bf16 matmul primitive with fp32 output on AArch64-based Processors (b8bdd63).
Fixed issue in matmul primitive for 4-dimentional tensor on AArch64-based Processors (b3be239).
Fixed out-of-bound warnings in deconvolution primitive on AArch64-based Processors (583215d).
Fixed correctness issue in reorder primitive with zero points for 4-dimentional shapes on AArch64-based Processors (e9d0fdb).
Enabled support of bf16 datatype in reorder primitive on AArch64-based Processors (188ae7f).
Fixed performance regression for backward convolution primitive creation time (2b3389f).
Improved performance of fp16 matmul with int4 weights on Intel GPUs based on Xe2 architecture (4c8fb2c, 3dd4f43, 280bd28).
Fixed performance regression for int8 convolution with large spatial sizes on processors with Intel AMX support (05d68df).
Restricted check for microkernel fusion support to cases when fusion functionality is actually used on Intel GPUs (48f6bd9).