This is a patch release containing the following changes to v3.9:
- Reduced sizes in Graph API SDPA examples (257d689)
- Fixed correctness issue in
bf16
depthwise convolution withbf16
bias on AArch64 CPUs (218b41d) - Changed Intel GPU data alignment check from error to warning (5c5008a)
- Improved
bf16
matmul performance on processors with Intel AMX instruction set support (54b6354, 30c4d8d) - Fixed PowerPC64 build by adding
-mcpu=power10
and-mmma
flags (02ca915) - Introduced support for
f16
destination inint8
matmul andint8
inner product on x64 CPUs (a62ed6b, 53c0a66, 0750043, 4f0f068) - Introduced support
per_tensor
zero-points inint8
matmul on Intel GPUs (db8e8ff, f783164, 4d458df, 80453a0, 7f90d50, a2200e2) - Fixed correctness issue in
int8
reorder for cases with compensation on x64 CPUs (771ca54)