This is a patch release containing the following changes to v3.8.1:
- Fixed performance regression for f32convolution primitive on processors with Intel AVX-512 instruction set support (5f3af68)
- Introduced support for f16destination inint8matmul andint8inner product on x64 CPUs (53fd12a, 22e252c, f5b2d7f, e4e2f1c)
- Improved RNN primitive performance on processors with Intel AVX2 instruction set support (71e5d81, eb27db2, dd4e627, ff134e0, 5a86c1f, e9395ae)
- Improved fp32matmul performance on processors with Intel AVX-512 instruction set support (1119339)
- Fixed segmentation fault in f32binary primitive with broadcast on x64 processors (2082e98)
- Fixed correctness issue in f64convolution weight gradient with bias on Intel Arc GPUs (a00bfab)
- Updated spdlogcomponent to version 1.15.3 (dbb3629)
- Fixed potential undefined behavior in convolution on Intel GPUs (5ac3e31)
- Fixed segmentation fault in convolution implementation with trivial filter on Intel CPUs (908c5fc, f0a0eee)
- Fixed segmentation fault in f16convolution with odd dimensions on processors with Intel AVX10.1 instruction set support (78d6835)
- Improved convolution primitive descriptor creation time on x64 processors (e9c5366, fd9dc58, f1d038e)
- Fixed performance regression in f16matmul withint4weights on Intel Arc Graphics B-series (38d761b)
- Improved bf16matmul performance on processors with Intel AMX instruction set support (0887aec)
- Fixed correctness issue in f32RNN primitive on processors with Intel AMX instruction set support (460a014)