github uxlfoundation/oneDNN v3.8-rc

latest release: v3.7.3
pre-release15 days ago

Performance Optimizations

Intel Architecture Processors

  • Improved matmul and inner product primitives performance on processors with Intel AMX instruction set support.
  • Improved performance of convolution and inner product primitives on processors with Intel AVX2 instruction set support.
  • Improved performance of int8 convolution support with zero points.
  • Improved fp32 convolution performance with fp16 and bf16 compressed weights on processors with Intel AVX2 or Intel AVX-512 instruction set support.
  • Improved fp16/bf16 depthwise convolution performance with fp32 bias or sum post-ops or dilation.
  • Improved bf16 pooling backpropagation performance.
  • Improved binary post-ops performance with per_w broadcast.

Intel Graphics Products

  • Improved performance on Intel GPUs based on Xe3 architecture.
  • Improved convolution performance on:
    • Intel Arc Graphics for Intel Core Ultra (Series 2, formerly Lunar Lake).
    • Intel Arc B-series discrete graphics (formerly Battlemage).
  • Improved int8 matmul performance with zero-points support for source and weight tensors.
  • Improved f4_e2m1 and f4_e3m0 matmul and reorder performance.
  • Improved performance of the following subgraphs with Graph API:

AArch64-based Processors

  • Improved fp16 reorder performance.
  • Improved int8 matmul performance.
  • Improved bf16 inner product forward propagation performance with Arm Compute Library (ACL).
  • Improved convolution performance on processors with SVE support with ACL.

Functionality

Common

  • Extended Graph API Softmax operation to support inf_as_zero mode. This functionality enables SDPA subgraph compliant with Pytorch Safe Softmax semantics.

Intel Architecture Processors

  • Introduced support for f32 convolution with fp16 compressed weights.
  • Enabled int8/int4 compressed weights support in matmul primitive.

Intel Graphics Products

  • Introduced select algorithm support in binary primitive.
  • Introduced support for f4_e2m1 and f4_e3m0 data types in convolution.
  • Introduced support for the GenIndex operation in Graph API.

Generic GPU Vendor

  • Introduced support for:
    • Vanilla RNN forward propagation
    • Inner product backpropagation
    • Group normalization
  • Improved accuracy of inner product primitive with sum post-ops for large shapes.

NVIDIA GPUs

  • Introduced Graph API support.

Usability

  • Added support for Group Normalization primitive with ONEDNN_ENABLE_PRIMITIVE build option.
  • Enabled support for ROCm 6 on AMD GPUs.
  • Improved CMake integration for oneDNN installation with Nvidia backend enabled.
  • Reduced memory footprint for matmul primitive when using ACL.

Validation

  • Added benchdnn option --execution-mode to test oneDNN functionality with SYCL Graph record/execute mode.
  • Extended benchdnn option --cold-cache with support for cold TLB mode.
  • Added benchdnn option --bia-dt to control bias data type for matmul, inner product, convolution, and deconvolution.
  • Extended syntax of benchdnn --dt option in Graph API driver to manage data types of individual tensors in a pattern.

Breaking Changes

Thanks to our Contributors

This release contains contributions from the project core team as well as Aditya Tewari @aditew01, Alexander Simonov @asimonov1, Denis @redradist, Dmitriy Ovchinnikov @inteldimitrius, Eliezer Weissmann @eliezerweissmann, Hubert Maciak @hmaciak, Ilya Lavrenov @ilya-lavrenov, James McGregor @Jmc18134, @jstachowintel, Marek Michalowski @michalowski-arm, Maria Zhukova @mzhukova, Orel Yehuda @yehudaorel, Ravi Pushkar @rpushkarr, Renato Barros Arantes @renato-arantes, @Shreyas-fuj, Shu Chen @shu1chen, Viktoriia Gvozdeva @vgvozdeva, Yair Obodovsky @yair-obodovsky, and @zhangfeiv0.

Don't miss a new oneDNN release

NewReleases is sending notifications on new releases.