This is a patch release containing the following changes to v2.5.2:
- Fixed accuracy issue in GELU post-op (3ff2c3d)
- Added ability to enable code only on non-x64 systems (ff7ae00)
- Fixed issue in reorder primitive on non-x64 systems (5917860)
- Fixed build issue on OSX11 and older cmake (d9c8bbe)
- Fixed assert in reorder primitive (79090bc)
- Documentation fixes (d290758, ee7eacb, 543b8f8)
- Fixed potential division by zero in example for binary primitive (2fffd96)
- Fixed SIGFPE issue in reorder primitive (8c291fc)
- Fixed potential size overflow in inner product primitive (c10f74a)
- Added logic to reduce the number of threads (tasks spawned for threadpool) for small shapes (8f885e7, 4053989, 49ec406, 2977360)
- Fixed SEGFAULT issue in matmul primitive (62c1170, a993d52)
- Added bf16 support for sum post-op (3d2c37e)
- Added fp:precise compiler flag for Intel Compiler identified as IntelLLVM (1558a4b)
- Fixed issue in bf16 convolution primitive when fused with binary (b379fd9)
- Fixed issue in backward depthwise convolution (d5e4122, f5cac23, eeaa19c)
- Fixed SEGFAULT in int8 convolution with eltwise post_op (32a629f)
- Fixed NaN issue in bf16 backward inner product (0c5e492)
- Fixed performance regression for binary with broadcast (f79b030, 58ce3c1)