AOCL-BLAS 4.2 Release Highlights
- Added uint8 output and zero-point support in int8 API’s in aocl_gemm addon (Low Precision GEMM / LPGEMM)
- Improved performance for all downscaled versions of all API’s in aocl_gemm addon (Low Precision GEMM / LPGEMM)
- Multithread performance improved across API’s in aocl_gemm addon (Low Precision GEMM / LPGEMM)
- Introduced AOCL_ENABLE_INSTRUCTIONS environment variable as an alternative to BLIS_ARCH_TYPE, but with slightly different semantics.
- Improved functionality of XERBLA error handling routine in AOCL-BLAS.
- Performance optimizations for the following APIs:
- DGEMM for tiny sizes
- S/ZGEMM, D/ZTRSM, ZAXPBYV, Z/ZDSCALV, S/D/ZGEMV, and D/DZNRM2 - Following BLAS extension APIs have been added only for AMD “Zen” code paths:
- sgemm_pack_get_size(), sgemm_pack(), and sgemm_compute()
- dgemm_pack_get_size(), dgemm_pack(), and dgemm_compute()