Performance Optimizations

Intel Architecture processors

Introduced initial int8 optimizations for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disable by default and should be enabled via CPU dispatcher control.
Improved matmul and inner product performance with bfloat16 data type.
Improved performance of tanh algorithm for eltwise primitive and LSTM cells.

Improved performance of Convolution, RNN, Inner Product and Matmul functionality for all supported GPUs.
Improved performance of int8 convolutions with activations in NHWC format for Xe architecture-based Graphics (code named DG1 and Tiger Lake).

Reduced primitives creation time due to enabled OpenCL pre-compiled headers feature in recent versions of OpenCL driver.
Reduced entitlement required on macOS with hardened runtime to allow-jit.
Extended documentation on runtime and build time controls for JIT profilers support, primitive cache, CPU dispatcher controls, and verbose mode.

RNN functionality is not functional with Level Zero GPU runtime. The workaround is to use OpenCL GPU runtime via setting SYCL_BE=PI_OPENCL before running a DPC++ program.
Optimized primitives can crash or fail for huge spatial sizes on CPU.
f32 convolutions may fail sporadically on Intel® Processor Graphics Gen11 due to a known issue in Intel Graphics Compiler.
Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
When running GPU kernels that take longer than a certain time (it depends on OS and system settings), you may face a situation resulting in apparent hang of the application. There are ways to configure driver or system settings to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including oneDNN examples:
o On Linux* (See more details at OpenCL™ Driver for Intel® HD, Iris™, and Iris™ Pro Graphics for Linux):
$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'
o On Windows* (See more details at Timeout Detection and Recovery (TDR) Registry Keys):
Increase TdrDelay and TdrDdiDelay values in registry
See DPC++ limitations that impact the library as well.