This is a patch release containing the following changes to v3.9.1:
- Fixed correctness issue in
int8
convolution on processors with Intel AVX2 and Intel DL Boost instruction set support (a7c4079, 78e781f) - Fixed performance regression for
f32
convolution primitive on processors with Intel AVX-512 instruction set support (74f23b4) - Fixed performance regression for RNN primitive with LBR GRU cell type on Intel Arc GPUs (ae2844e)
- Fixed performance regression for
int8
convolution primitive when using zero points (dbb8484) - Fixed segmentation fault in matmul primitive when using
ONEDNN_VERBOSE=all
(7310aa2) - Fixed correctness issue in multi-dimensional matmul primitive on Intel Xeon processors with Intel AMX instruction set support (formerly Sapphire Rapids and Granite Rapids) (642d18b)
- Reduced problem size in
test_sdpa_decomp
test (9bff06e) - Restricted
test_sdpa_decomp
andtest_mqa_decomp
tests toOMP
orTHREADPOOL
CPU runtimes (3cd9170) - Fixed illegal instruction issue in pooling primitive on processors with Intel SSE4.1 support (d907c47)
- Fixed segmentation fault issue in
f16
backward convolution primitive on processors with Intel AVX2 with Intel DL Boost with float16 and bfloat16 support (50cc228, fcc7e5e) - Restored support for
int8
matmul withper_oc
scales and zero points on Intel Arc GPUs (1a5a454, 04c22c9)