This is a preview release for oneDNN v2.0. The release is based on oneDNN v1.6.
Binary distribution of this software is available as Intel(R) oneAPI Deep Neural Network Library in Intel(R) oneAPI.
Performance Optimizations
Intel Architecture processors
- Introduced initial int8 optimizations for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disable by default and should be enabled via CPU dispatcher control.
- Improved matmul and inner product performance with bfloat16 data type.
- Improved performance of
tanh
algorithm for eltwise primitive and LSTM cells.
Intel Processor Graphics and Xe architecture-based Graphics
- Improved performance of Convolution, RNN, Inner Product and Matmul functionality for all supported GPUs.
- Improved performance of int8 convolutions with activations in NHWC format for Xe architecture-based Graphics (code named DG1 and Tiger Lake).
New Functionality
- Introduced support for processors based on IBM POWER architecture.
- Introduced Linear-Before-Reset GRU for GPU.
- Extended eltwise primitive with support for
round
operation.
Usability
- Reduced primitives creation time due to enabled OpenCL pre-compiled headers feature in recent versions of OpenCL driver.
- Reduced entitlement required on macOS with hardened runtime to
allow-jit
. - Extended documentation on runtime and build time controls for JIT profilers support, primitive cache, CPU dispatcher controls, and verbose mode.
Validation
- Introduced validation mode for out of memory situations.
Known Issues and Limitations
- RNN functionality is not functional with Level Zero GPU runtime. The workaround is to use OpenCL GPU runtime via setting SYCL_BE=PI_OPENCL before running a DPC++ program.
- Optimized primitives can crash or fail for huge spatial sizes on CPU.
- f32 convolutions may fail sporadically on Intel® Processor Graphics Gen11 due to a known issue in Intel Graphics Compiler.
- Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
- When running GPU kernels that take longer than a certain time (it depends on OS and system settings), you may face a situation resulting in apparent hang of the application. There are ways to configure driver or system settings to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including oneDNN examples:
o On Linux* (See more details at OpenCL™ Driver for Intel® HD, Iris™, and Iris™ Pro Graphics for Linux):
$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'
o On Windows* (See more details at Timeout Detection and Recovery (TDR) Registry Keys):
Increase TdrDelay and TdrDdiDelay values in registry - See DPC++ limitations that impact the library as well.