This is a patch release containing the following changes to v3.8.1:
- Fixed performance regression for
f32convolution primitive on processors with Intel AVX-512 instruction set support (5f3af68) - Introduced support for
f16destination inint8matmul andint8inner product on x64 CPUs (53fd12a, 22e252c, f5b2d7f, e4e2f1c) - Improved RNN primitive performance on processors with Intel AVX2 instruction set support (71e5d81, eb27db2, dd4e627, ff134e0, 5a86c1f, e9395ae)
- Improved
fp32matmul performance on processors with Intel AVX-512 instruction set support (1119339) - Fixed segmentation fault in
f32binary primitive with broadcast on x64 processors (2082e98) - Fixed correctness issue in
f64convolution weight gradient with bias on Intel Arc GPUs (a00bfab) - Updated
spdlogcomponent to version 1.15.3 (dbb3629) - Fixed potential undefined behavior in convolution on Intel GPUs (5ac3e31)
- Fixed segmentation fault in convolution implementation with trivial filter on Intel CPUs (908c5fc, f0a0eee)
- Fixed segmentation fault in
f16convolution with odd dimensions on processors with Intel AVX10.1 instruction set support (78d6835) - Improved convolution primitive descriptor creation time on x64 processors (e9c5366, fd9dc58, f1d038e)
- Fixed performance regression in
f16matmul withint4weights on Intel Arc Graphics B-series (38d761b) - Improved
bf16matmul performance on processors with Intel AMX instruction set support (0887aec) - Fixed correctness issue in
f32RNN primitive on processors with Intel AMX instruction set support (460a014)