AOCL-BLAS 5.2 Release Notes
Overview
This release includes significant performance improvements, new features, and critical bug fixes for the AOCL - BLAS linear algebra library, with optimizations specifically targeting AMD Zen4 and Zen5 architectures.
Performance Improvements
GEMM Improvements
- Tuned ZGEMM thresholds for Zen4 and Zen5 architectures
- Optimized AVX512 ZGEMM kernel and edge-case handling
- Improved ZGEMM packing kernel for M-dimension edge cases
- Developed Optimal thread selection logic for ZGEMM on Zen5
GEMV Enhancements
- Added DGEMV no-transpose multithreaded implementations
- Exported AVX512 DGEMV kernels
- DGEMV bug fixes and code cleanup
- Added ability to handle non-unit incx in GEMV transpose kernel
- Improved numerical precision in ZGEMV API
DCOPY Optimization
- Tuned DCOPY aocl_dynamic logic for Zen4/Zen5 architectures
New Features
-
Additional build options to disable optimized code paths for smaller matrices in GEMM and TRSM
- Useful for testing and benchmarking
- Reduces numerical rounding differences when repeating calculations with different core counts
-
Complete set of GEMMTR APIs implemented
Bug Fixes
Critical Fixes
- Fixed probable integer overflow in TPSV
- Fixed ZTRSM accuracy for conjugate transpose
- Fixed DTRSM small threshold for extremely skinny sizes on Zen5
Acknowledgments
This release is the result of contributions from the AOCL team at AMD and the broader BLIS community.
Release Date: January 2026
Version: 5.2 GA