github oneapi-src/oneDNN v3.0

latest releases: v3.4.2, v3.5-pc, v3.4.1...
17 months ago

Performance Optimizations

  • Intel Architecture Processors:
    • Improved performance for 4th generation Intel Xeon Scalable processor (formerly Sapphire Rapids).
    • Introduced FP16 support and initial optimizations for future Intel Xeon Scalable processor (code name Granite Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
  • Intel Graphics Products:
    • Improved performance for Intel Data Center GPU Max Series (formerly Ponte Vecchio).
    • Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and Intel Data Center GPU Flex Series (formerly Arctic Sound-M).
  • AArch64-based Processors:
    • Improved reorder performance for processors with Scalable Vector Extensions (SVE) support.
    • Improved pooling performance with post-ops for processors with SVE 512 support.
    • Improved batch normalization performance with non-default flags for processors with SVE 512 support.
    • Improved performance of FP16 functionality with Compute Library for Arm Architecture (ACL).
    • Improved deconvolution performance with ACL.
  • PowerPC64-based Processors:
    • Improved int8 GEMM performance.

Functionality

  • Introduced new quantization scheme. Major changes include support for per-argument runtime scales in all primitives and unquantized bias.
  • [experimental] Introduced Graph API support that simplifies oneDNN integration into applications. The functionality is disabled by default and can be enabled at build time with ONEDNN_BUILD_GRAPH=ON flag.
  • Introduced support for Intel DPC++/C++ Compiler 2023.0, including new features from the SYCL 2020 standard.
  • Extended persistent cache to cover GPU engine object. This improvement allows applications to further reduce oneDNN initialization time.
  • Extended threadpool API with a function to indicate maximum available concurrency.
  • Extended binary primitive implementation on GPU with bfloat16 source and int8 destination support.
  • Introduced pooling and reduction primitives support on AMD GPUs.
  • Introduced reduction primitive support on NVIDIA GPUs.

Usability

  • Extended the set of supported format tags to cover formats used in applications.

Validation

  • Extended the GoogleTest (gtest) suite with support for Parametric Rectified Linear Unit (PReLU) primitive.

Breaking Changes

  • Removed deprecated APIs.
  • Removed operation descriptor object and made memory descriptor object opaque. See details in operation and memory descriptors RFC.
  • Removed creation time primitive scales support and primitive output scales support. See details in quantization scaling RFC.
  • Removed support for Intel DPC++/C++ Compiler with SYCL 1.2.1 (aka SYCL 2017) standard.
  • Removed Winograd convolution implementation for int8 data type.
  • Updated minimal supported ACL version to 22.08 (was 22.05).

Thanks to the Contributors

This release contains contributions from the project core team as well as @akshatasangelkar, Aryan Karumuri @AryanKarumuri, Crefeda Rodrigues @cfRod, Divakar Mariyanna @bmdivakar, Gordon Fossum @austinpagan, Jonathan Deakin @jondea, Kentaro Kawakami @kawakami-k, lilianhuang @lilh9598, Milos Puzovic @milpuz01, Mona Minakshi @monaminakshi, Nathan John Sircombe @nSircombe, Peter Caday @petercad, and Sreekanth Yalachigere @sreekanth-yalachigere. We would also like to thank everyone who asked questions and reported issues.

Don't miss a new oneDNN release

NewReleases is sending notifications on new releases.