github oneapi-src/oneDNN v2.0-beta07

latest releases: v3.7-pc, v3.6.2, v3.6.1...
pre-release4 years ago

This is a preview release for oneDNN v2.0. The release is based on oneDNN v1.5.

Binary distribution of this software is available as Intel(R) oneAPI Deep Neural Network Library in Intel(R) oneAPI.

Performance optimizations

Intel Architecture processors

  • Improved performance of convolutional neural networks (CNN) related functionality with NHWC activations on all supported processors
  • Improved binary primitive performance for the broadcast case
  • Improved performance of eltwise primitive backpropagation and corresponding post-ops
  • Improved performance of pooling, resampling, LRN primitives
  • Improved performance of bfloat16 and fp32 weights gradient convolutions with groups
  • Improved performance of int8 convolutions with 1x1 kernel and spatial strides

Intel Processor Graphics and Xe architecture-based Graphics

  • Introduced initial optimizations for Xe architecture-based Graphics (code named DG1 and Tiger Lake).
  • Improved performance of convolutional neural networks (CNN) related functionality with NHWC activations.

New Functionality

  • Level Zero (L0) GPU runtime is used by default on Windows* operating system. OpenCL GPU runtime still can be used if SYCL_BE environment variable is set to PI_OPENCL before running a DPC++ program.

Usability

Validation

  • Introduced validation mode to detect out of bounds access.

Known Limitations

  • RNN functionality is not functional with Level Zero GPU runtime. The workaround is to use OpenCL GPU runtime via setting SYCL_BE=PI_OPENCL before running a DPC++ program.
  • Optimized primitives can crash or fail for huge spatial sizes on CPU.
  • f32 convolutions may fail sporadically on Intel® Processor Graphics Gen11 due to a known issue in Intel Graphics Compiler.
  • Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
  • When running GPU kernels that take longer than a certain time (it depends on OS and system settings) you may face a situation resulting in apparent hang of the application. Configure driver to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including DNNL examples.

On Linux:

$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'

On Windows increase TdrDelay and TdrDdiDelay values using registry.

Don't miss a new oneDNN release

NewReleases is sending notifications on new releases.