oneapi-src/oneDNN v3.0.1 on GitHub

This is a patch release containing the following changes to v3.0:

Fixed potential correctness issue in convolution weight gradient with 1x1 filter and strides (e589966)
Improved convolution, deconvolution, inner product, and matmul primitives performance with scales on Intel CPUs (38319f1, 18de927, b6170d1, 85171b0)
Reverted MEMFD allocator in Xbyak to avoid fails in high load scenarios (eaaa41b)
Fixed array out of bounds issue in bfloat16 convolution weight gradient on Intel CPUs (a17a64c)
Improved compatibility with future versions of Intel GPU driver (eb7a0a0)
Fixed segfault in fp16 and bfloat16 convolution backward propagation on systems with Intel AMX support (293561b)
Fixed build issue with GCC 13 (1d7971c)
Fixed correctness issue in int8 RNN primitive Vanilla GRU flavor on Intel CPUs (f4a149c, fbf8dca)
Added check for unsupported arguments in binary primitive implementation for AArch64-based processors (5bb9070)
Fixed correctness issue in int8 convolution with zero-points on Intel Data Center GPU Max Series (96e868c)
Fixed runtime error in convolution primitive with small number of channels on Xe-based graphics (068893e)
Removed use of OpenCL C variable length arrays in reduction primitive implementation for Intel GPUs (41e8612)
Fixed correctness issue in matmul and inner product primitives on Intel Data Center GPU Max Series (a1e6bc5, dbb7c28)
Fixed segfault in fp16 and bfloat16 convolution backward propagation on future Intel Xeon processors (code name Sierra Forest) (399b7c5)
Fixed runtime error in Graph API for partitions with quantized matmul and add operations (f881da5, 699ba75, b8d21a5, 9421fb2)
Fixed convolution performance regression on Xe-based graphics (1869bf2)
Improved convolution performance with OHWI and OIHW weight formats on Intel Data Center GPU Max Series (2d0b31e, 5bd5d52)
Fixed include files handling in build system affecting CMake projects relying on oneDNN (c616453)
Added tbb::finalize to tests and examples to address intermittent test crashes with TBB runtime (891a415, c79e543, 8312c3a, 1a32b95, bd0389d, f05013d, ab7938f, 31c9e7b, f3261e4, d58ac41, f8c67b9, 258849b, b20a8c7)
Fixed segfault in fp16 convolution primitive on future Intel Xeon processors (code name Granite Rapids) (a574fff)
Fixed correctness issue in fp16 convolution primitive on future Intel Xeon processors (code name Sierra Forest) (f165ed8)
Fixed correctness issue in int8 convolution primitive on Intel CPUs (ca15922, 27845b8)
Fixed correctness issue in int8 convolution primitive on Intel Data Center GPU Max Series (8bb651c)
Fixed correctness issue in resampling primitive with post-ops on Intel CPUs (aa52a51)
Addressed excessive memory consumption in 3D convolution on Intel CPUs (3d6412a, 097acb5, fd69663)
Fixed segfault in convolution with sum and relu post-ops on Intel CPUs (63ad769, 1b13037, 0a8116b, 9972cb8)
Addressed convolution performance regression with small number of channels on Intel GPUs (d3af877)
Worked around MSVS 2019 bug resulting in build fails on Windows (4024775)
Updated code base formatting to clang-format 11 (23576f9, 0b1bf84)