ROCm/rocBLAS rocm-4.5.0 on GitHub

Improved performance of non-batched and batched syr for all sizes and data types
Improved performance of non-batched and batched hemv for all sizes and data types
Improved performance of non-batched and batched symv for all sizes and data types
Improved memory utilization in rocblas-bench, rocblas-test gemm functions, increasing possible runtime sizes.
Improved performance of non-batched and batched dot, dotc, and dot_ex for small n. e.g. sdot n <= 31000.
Improved performance of non-batched and batched trmv for all sizes and matrix types.
Improved performance of non-batched and batched gemv transpose case for all sizes and datatypes.
Improved performance of sger and dger for all sizes, in particular the larger dger sizes.
Improved performance of syrkx for for large size including those in rocBLAS Issue #1184.

Update from C++14 to C++17.
Packaging split into a runtime package (called rocblas) and a development package (called rocblas-dev for .deb packages, and rocblas-devel for .rpm packages). The development package depends on runtime. The runtime package suggests the development package for all supported OSes except CentOS 7 to aid in the transition. The suggests feature in packaging is introduced as a deprecated feature and will be removed in a future rocm release.

For function geam avoid overflow in offset calculation.
For function syr avoid overflow in offset calculation.
For function gemv (Transpose-case) avoid overflow in offset calculation.
For functions ssyrk and dsyrk, allow conjugate-transpose case to match legacy BLAS. Behavior is the same as the transpose case.

ROCm/rocBLAS rocm-4.5.0 rocBLAS 2.41.0 for ROCm 4.5.0 on GitHub