Added
- rocTX support in rocBLAS (not available on Windows or in the static library version on Linux)
- On gfx12, all functions now support full
rocblas_int
dynamic range forbatch_count
--ninja
build option- Support for GPU_TARGETS cmake variable
Changed
- rocblas-test client removes the stress tests unless YAML-based testing or
gtest_filter
adds them - rocblas clients OpenMP default threading is reduced to be less than the logical core count
gemm_ex
testing and timing reuses device memorygemm_ex
timing initializes matrices on device
Optimized
- Significantly reduced workspace memory requirements for Level 1 ILP64:
iamax
andiamin
- Reduced workspace memory requirements for Level 1 ILP64:
dot
,asum
,nrm2
- Improved the performance of Level 2 gemv for the problem sizes (
TransA == N && m > 2*n
) and (TransA == T
) - Improved the performance of Level 3 syrk and herk for the problem size (
k > 500 && n < 4000
)
Resolved issues
- gfx12:
ger
,geam
,geam_ex
,dgmm
,trmm
,symm
,hemm
, ILP64gemm
, and larger data support - Added a
gfortran
package dependency for Azure Linux OS - Outdated SLES OS package dependencies (
cxxtools
andjoblib
) ininstall.sh -d
- Code object stripping for RPM packages
Upcoming changes
- Deprecated the cmake variable
AMDGPU_TARGETS
. UseGPU_TARGETS
instead.