github amd/blis 4.2
AOCL-BLAS 4.2

2 months ago

AOCL-BLAS 4.2 Release Highlights

  • Added uint8 output and zero-point support in int8 API’s in aocl_gemm addon (Low Precision GEMM / LPGEMM)
  • Improved performance for all downscaled versions of all API’s in aocl_gemm addon (Low Precision GEMM / LPGEMM)
  • Multithread performance improved across API’s in aocl_gemm addon (Low Precision GEMM / LPGEMM)
  • Introduced AOCL_ENABLE_INSTRUCTIONS environment variable as an alternative to BLIS_ARCH_TYPE, but with slightly different semantics.
  • Improved functionality of XERBLA error handling routine in AOCL-BLAS.
  • Performance optimizations for the following APIs:
    - DGEMM for tiny sizes
    - S/ZGEMM, D/ZTRSM, ZAXPBYV, Z/ZDSCALV, S/D/ZGEMV, and D/DZNRM2
  • Following BLAS extension APIs have been added only for AMD “Zen” code paths:
    - sgemm_pack_get_size(), sgemm_pack(), and sgemm_compute()
    - dgemm_pack_get_size(), dgemm_pack(), and dgemm_compute()

Don't miss a new blis release

NewReleases is sending notifications on new releases.