This is a patch release containing the following changes to v3.11:
- Fixed performance regression in
bf16matmul withint4weights on Intel GPUs based on Xe2 architecture (d4d4d7a) - Fixed performance regression in inner product primitive with transposed weights on x64 CPUs (c5d2d09)
- Updated benchdnn input files for matmul and convolution performance benchmarking (e80a1a8, 96d72a9, b9c9bce)
- Fixed an out of registers issue in SDPA fusion with Graph API on Intel GPUs (ba81382)
- Fixed integer overflow in softmax primitive implementation for Intel GPUs (4a711d7, b02cfa0, c557f33, ab64a9b)
- Fixed incorrect results in
f64convolution weight gradient on Intel GPUs based on Xe-LPG architecture (adcb323, 3d1a7e4) - Removed in-place optimization for reorder in Graph API to avoid correctness issues (a6c3630)
- Improved performance of
int8,f16, andbf16convolution on processors with Intel AMX support (a418949) - Fixed a correctness issue in
f32convolution with small number of input channels (3d1d9b4, ada85c5) - Fixed a correctness issue in matmul with binary post-op and non-trivial strides on x64 CPUs (f49f470, 265df18, 5892570)
- Fixed benchdnn graph driver test to support non-trivial strides (0232763, 662cbb3)
- Fixed a correctness issue in 3D grouped convolution weight gradient on Intel GPUs (8a7996b)
- Fixed a page fault issue in
f32SDPA subgraph on Intel GPUs (98845e5) - Fixed a performance regression in
bf16matmul on x64 CPUs with Intel AMX instruction set support (5b886e8, f3a79e7, 52cc900, cf9a11e) - Fixed a segmentation fault in matmul on x64 processors with Intel AVX 10.2 and Intel AMX instruction set support (98aea2f)
- Fixed correctness issue in SDPA subgraph with non-trivial strides for mask on Intel GPUs (0ccdfba)