uxlfoundation/oneDNN v3.11.1 on GitHub

This is a patch release containing the following changes to v3.11:

Fixed performance regression in bf16 matmul with int4 weights on Intel GPUs based on Xe2 architecture (d4d4d7a)
Fixed performance regression in inner product primitive with transposed weights on x64 CPUs (c5d2d09)
Updated benchdnn input files for matmul and convolution performance benchmarking (e80a1a8, 96d72a9, b9c9bce)
Fixed an out of registers issue in SDPA fusion with Graph API on Intel GPUs (ba81382)
Fixed integer overflow in softmax primitive implementation for Intel GPUs (4a711d7, b02cfa0, c557f33, ab64a9b)
Fixed incorrect results in f64 convolution weight gradient on Intel GPUs based on Xe-LPG architecture (adcb323, 3d1a7e4)
Removed in-place optimization for reorder in Graph API to avoid correctness issues (a6c3630)
Improved performance of int8, f16, and bf16 convolution on processors with Intel AMX support (a418949)
Fixed a correctness issue in f32 convolution with small number of input channels (3d1d9b4, ada85c5)
Fixed a correctness issue in matmul with binary post-op and non-trivial strides on x64 CPUs (f49f470, 265df18, 5892570)
Fixed benchdnn graph driver test to support non-trivial strides (0232763, 662cbb3)
Fixed a correctness issue in 3D grouped convolution weight gradient on Intel GPUs (8a7996b)
Fixed a page fault issue in f32 SDPA subgraph on Intel GPUs (98845e5)
Fixed a performance regression in bf16 matmul on x64 CPUs with Intel AMX instruction set support (5b886e8, f3a79e7, 52cc900, cf9a11e)
Fixed a segmentation fault in matmul on x64 processors with Intel AVX 10.2 and Intel AMX instruction set support (98aea2f)
Fixed correctness issue in SDPA subgraph with non-trivial strides for mask on Intel GPUs (0ccdfba)