This is a patch release containing the following changes to v3.0:
- Fixed potential correctness issue in convolution weight gradient with 1x1 filter and strides (e589966)
- Improved convolution, deconvolution, inner product, and matmul primitives performance with scales on Intel CPUs (38319f1, 18de927, b6170d1, 85171b0)
- Reverted MEMFD allocator in Xbyak to avoid fails in high load scenarios (eaaa41b)
- Fixed array out of bounds issue in
bfloat16
convolution weight gradient on Intel CPUs (a17a64c) - Improved compatibility with future versions of Intel GPU driver (eb7a0a0)
- Fixed segfault in
fp16
andbfloat16
convolution backward propagation on systems with Intel AMX support (293561b) - Fixed build issue with GCC 13 (1d7971c)
- Fixed correctness issue in
int8
RNN primitive Vanilla GRU flavor on Intel CPUs (f4a149c, fbf8dca) - Added check for unsupported arguments in binary primitive implementation for AArch64-based processors (5bb9070)
- Fixed correctness issue in
int8
convolution with zero-points on Intel Data Center GPU Max Series (96e868c) - Fixed runtime error in convolution primitive with small number of channels on Xe-based graphics (068893e)
- Removed use of OpenCL C variable length arrays in reduction primitive implementation for Intel GPUs (41e8612)
- Fixed correctness issue in matmul and inner product primitives on Intel Data Center GPU Max Series (a1e6bc5, dbb7c28)
- Fixed segfault in
fp16
andbfloat16
convolution backward propagation on future Intel Xeon processors (code name Sierra Forest) (399b7c5) - Fixed runtime error in Graph API for partitions with quantized matmul and add operations (f881da5, 699ba75, b8d21a5, 9421fb2)
- Fixed convolution performance regression on Xe-based graphics (1869bf2)
- Improved convolution performance with
OHWI
andOIHW
weight formats on Intel Data Center GPU Max Series (2d0b31e, 5bd5d52) - Fixed include files handling in build system affecting CMake projects relying on oneDNN (c616453)
- Added
tbb::finalize
to tests and examples to address intermittent test crashes with TBB runtime (891a415, c79e543, 8312c3a, 1a32b95, bd0389d, f05013d, ab7938f, 31c9e7b, f3261e4, d58ac41, f8c67b9, 258849b, b20a8c7) - Fixed segfault in
fp16
convolution primitive on future Intel Xeon processors (code name Granite Rapids) (a574fff) - Fixed correctness issue in
fp16
convolution primitive on future Intel Xeon processors (code name Sierra Forest) (f165ed8) - Fixed correctness issue in
int8
convolution primitive on Intel CPUs (ca15922, 27845b8) - Fixed correctness issue in
int8
convolution primitive on Intel Data Center GPU Max Series (8bb651c) - Fixed correctness issue in resampling primitive with post-ops on Intel CPUs (aa52a51)
- Addressed excessive memory consumption in 3D convolution on Intel CPUs (3d6412a, 097acb5, fd69663)
- Fixed segfault in convolution with
sum
andrelu
post-ops on Intel CPUs (63ad769, 1b13037, 0a8116b, 9972cb8) - Addressed convolution performance regression with small number of channels on Intel GPUs (d3af877)
- Worked around MSVS 2019 bug resulting in build fails on Windows (4024775)
- Updated code base formatting to clang-format 11 (23576f9, 0b1bf84)