github oneapi-src/oneDNN v2.0-beta10

latest releases: v3.5-pc, v3.4.1, v3.3.6...
pre-release3 years ago

This is a preview release for oneDNN v2.0. The release is based on oneDNN v1.7.

Binary distribution of this software is available as Intel(R) oneAPI Deep Neural Network Library in Intel(R) oneAPI.

Performance optimizations

  • Intel Processor Graphics and Xe architecture-based Graphics:
    • Improved performance of convolutions and matmul primitives.
    • Improved performance of int8 convolutions for NHWC activations format.
  • Intel Architecture processors:
    • Improved performance of primitives for NHWC activations format.
    • Improved fp32 GEMM performance for small N
    • Improved performance of int8 primitives for processors with Intel SSE4.1 instruction set support.
  • AArch64-based processors
    • Added support for Arm Performance Library (ArmPL). ArmPL provides optimized GEMM implementation for aarch64.
    • Added support for (Arm Compute Library (ArmCL))[https://github.com/arm-software/ComputeLibrary]. ArmCL provides optimized convolution implementation for aarch64.

New Functionality

  • Added support for IBMz (s390x) and IBM POWER (powerpc64) architectures
  • Introduced RNN GRU for GPU.
  • Introduced int8 RNN GRU for CPU
  • Introduced asymmetric quantization support for convolutions, matmul, and inner product
  • Introduced dilated pooling support.
  • Extended matmul primitive to support multiple dimensions in batch and broadcast on CPU.
  • (preview) Introduced binary post-op for (de)-convolution, pooling, eltwise, binary, inner product, and matmul.
  • (preview) Extended the number of supported post-ops for primitives to 20.
  • (preview) Introduced reduction primitive for CPU. Together with post-ops this functionality allows to implement normalization.

Thanks to the contributors

This release contains contributions from the project core team as well as Ben Fitch, Brian Shi, David Edelsohn @edelsohn, Diana Bite @diaena, Moaz Reyad @moazreyad, Nathan John Sircombe @nSircombe, Niels Dekker @N-Dekker, Peter Caday @petercad, Pinzhen Xu @pinzhenx, pkubaj @pkubaj, Tsao Zhong @CaoZhongZ. We would also like to thank everyone who asked questions and reported issues.

Known Issues and Limitations

  • f32 convolutions may hang sporadically on Intel Processor Graphics Gen11. No workaround available.
  • Pooling, batch normalization, and binary primitives may segfault when executed on Xe architecture-based graphics. No workaround available.
  • oneDNN functionality may corrupt memory and lead to application crash on GPU with Level Zero runtime in USM mode on all GPU platforms. As a workaround use SYCL buffers or OpenCL runtime:
    export SYCL_BE=PI_OPENCL
  • Matmul function may hang on GPU with Level Zero runtime on Windows. As a workaround use OpenCL runtime:
    export SYCL_BE=PI_OPENCL
  • Convolution may hang on GPU for shapes with 3 input channels. No workaround available.
  • Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
  • When running GPU kernels that take longer than a certain time (it depends on OS and system settings), you may face a situation resulting in apparent hang of the application. There are ways to configure driver or system settings to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including oneDNN examples:
    o On Linux* (See more details at OpenCL™ Driver for Intel® HD, Iris™, and Iris™ Pro Graphics for Linux):
    $ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'
    o On Windows* (See more details at Timeout Detection and Recovery (TDR) Registry Keys):
    Increase TdrDelay and TdrDdiDelay values in registry
  • See DPC++ limitations that impact the library as well.

Don't miss a new oneDNN release

NewReleases is sending notifications on new releases.