Breaking API changes

OpenCL API:
- OpenCL interoperability API moved to dnnl_ocl.hpp.
- Engine, stream, and memory are created from corresponding CL objects using free functions.
Threadpool
- Threadpool API is moved to dnnl_threadpool.hpp.
- Stream object for threadpool is created using free function dnnl::threadpool_interop::make_stream.
- Removed stream attributes.

New Functionality

Introduced SYCL API extensions compliant with oneAPI specification v1.0.
Introduced support for Intel(R) DPC++ Compiler and Level Zero runtime.
Introduced Unified Shared Memory (USM) support for Intel Processor Graphics and Xe architecture-based graphics.

Pooling, batch normalization, and binary primitives may segfault when executed on Xe architecture-based graphics. No workaround available.
Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
When running GPU kernels that take longer than a certain time (it depends on OS and system settings), you may face a situation resulting in apparent hang of the application. There are ways to configure driver or system settings to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including oneDNN examples:
o On Linux* (See more details at OpenCL™ Driver for Intel® HD, Iris™, and Iris™ Pro Graphics for Linux):
$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'
o On Windows* (See more details at Timeout Detection and Recovery (TDR) Registry Keys):
Increase TdrDelay and TdrDdiDelay values in registry
See DPC++ limitations that impact the library as well.