github uxlfoundation/oneDNN v3.11.1

8 hours ago

This is a patch release containing the following changes to v3.11:

  • Fixed performance regression in bf16 matmul with int4 weights on Intel GPUs based on Xe2 architecture (d4d4d7a)
  • Fixed performance regression in inner product primitive with transposed weights on x64 CPUs (c5d2d09)
  • Updated benchdnn input files for matmul and convolution performance benchmarking (e80a1a8, 96d72a9, b9c9bce)
  • Fixed an out of registers issue in SDPA fusion with Graph API on Intel GPUs (ba81382)
  • Fixed integer overflow in softmax primitive implementation for Intel GPUs (4a711d7, b02cfa0, c557f33, ab64a9b)
  • Fixed incorrect results in f64 convolution weight gradient on Intel GPUs based on Xe-LPG architecture (adcb323, 3d1a7e4)
  • Removed in-place optimization for reorder in Graph API to avoid correctness issues (a6c3630)
  • Improved performance of int8, f16, and bf16 convolution on processors with Intel AMX support (a418949)
  • Fixed a correctness issue in f32 convolution with small number of input channels (3d1d9b4, ada85c5)
  • Fixed a correctness issue in matmul with binary post-op and non-trivial strides on x64 CPUs (f49f470, 265df18, 5892570)
  • Fixed benchdnn graph driver test to support non-trivial strides (0232763, 662cbb3)
  • Fixed a correctness issue in 3D grouped convolution weight gradient on Intel GPUs (8a7996b)
  • Fixed a page fault issue in f32 SDPA subgraph on Intel GPUs (98845e5)
  • Fixed a performance regression in bf16 matmul on x64 CPUs with Intel AMX instruction set support (5b886e8, f3a79e7, 52cc900, cf9a11e)
  • Fixed a segmentation fault in matmul on x64 processors with Intel AVX 10.2 and Intel AMX instruction set support (98aea2f)
  • Fixed correctness issue in SDPA subgraph with non-trivial strides for mask on Intel GPUs (0ccdfba)

Don't miss a new oneDNN release

NewReleases is sending notifications on new releases.