github oneapi-src/oneDNN v2.1

latest releases: v3.5-pc, v3.4.1, v3.3.6...
3 years ago

Performance optimizations

  • Reduced overheads associated with primitive cache.

  • Intel Processor Graphics and Xe architecture-based Graphics:

    • Improved performance of Winograd convolution.
    • Improved functionality performance for padded memory formats.
    • Improved performance of reorder and shuffle primitives for multiple formats and all dimensions.
    • Improved performance of pooling primitive for float16 data type.
    • Improved performance of lnorm primitive for plain formats.
    • Improved performance of resampling primitive for blocked formats.
  • Intel Architecture processors

    • Introduced initial optimizations for bfloat16 functionality for future Intel Xeon Scalable processor with Intel AMX support (code name Sapphire Rapids).
    • Improved performance of int8 and bfloat16 RNN and inner product primitives.
    • Improved performance of shuffle primitive for bfloat16 data type.
    • Introduced CPU ISA hints environment variable and API. New API is intended to dispatch function implementations using YMM registers to improve performance on processors with a single Intel AVX512 compute unit.
    • Improved forward convolution performance for Intel AVX-512 systems.
    • Introduced initial performance optimizations for future Intel Core processor with Intel AVX2 and Intel DL Boost instructions support (code name Alder Lake).
    • Improved performance of int8 primitive for processors with Intel SSE4.1 instruction set support.
    • Improved convolution and batch normalization performance with threadpool.
  • AArch64-based processors

    • Improved performance of Winograd convolution with ArmCL.
    • Improved performance of int8 convolution with ArmCL.
    • Added JIT support for Aarch64 and JIT implementations for reorder, eltwise, pooling, and batch normalization primitives.
  • NVIDIA GPUs

    • (preview) Introduced support for NVIDIA GPU. The implementation relies on DPC++ Compiler, cuDNN, and cuBLAS libraries.

New Functionality

  • Introduced int8 support for LSTM primitive with projection for CPU.
  • Introduced binary post-op for (de)-convolution, pooling, eltwise, binary, inner product, matmul and reduction (GPU only) along with performance optimizations for CPUs and GPUs.
  • Extended the number of supported post-ops for primitives to 20.
  • Extended eltwise primitive with support for logsigmoid and clip_v2 algorithms.
  • Introduced support for PRelu primitive.
  • Extended matmul implementation with support for per-output channel zero-points for quantization.
  • Extended support for broadcasting in binary primitive to both inputs for CPU.
  • Introduced float16 support in reduction primitive for GPU.
  • Introduced support for mixed input and output types in binary primitive for GPU.

Usability

  • Added API to enable displaying timestamps in oneDNN verbose mode. Timestamps allow to use oneDNN verbose output in profiling tools.

Validation

  • Extended benchdnn to report operation bandwidth.
  • Added ability to choose target GPU in benchdnn.

Thanks to the contributors

This release contains contributions from the project core team as well as Alejandro Alvarez, Aleksandr Nikolaev @alenik01, araki.kenichi @qnet-araki, Arthur Mitrano @aaraujom, Benjamin Fitch, Ben Tracy @CodeplayBen, Daniel Soutar @danielsoutar, @dylan-angus-codeplay, Diana Bite @diaena, higuchi.motoko @higuchi-motoko, Jacob Kahn @jacobkahn, Kentaro Kawakami @kawakami-k, Kumudha KN @KumudhaN, kurihara @Koji-Kurihara, Mehdi Goli @mehdi-goli, Nathan John Sircombe @nSircombe, Peter Caday @petercad, Rafik Saliev @rfsaliev, Xinyu Chen @xinyu-intel, yuri@FreeBSD @yurivict. We would also like to thank everyone who asked questions and reported issues.

Don't miss a new oneDNN release

NewReleases is sending notifications on new releases.