github ROCm/rocBLAS rocm-4.2.0
rocBLAS-2.38.0 for ROCm 4.2.0

latest releases: rocm-6.1.0, rocm-6.0.2, rocm-6.0.0...
2 years ago

Added

  • Added option to install script to build only rocBLAS clients with a pre-built rocBLAS library
  • Supported gemm ext for unpacked int8 input layout on gfx908 GPUs
    • Added new flags rocblas_gemm_flags::rocblas_gemm_flags_pack_int8x4 to specify if using the packed layout
      • Set the rocblas_gemm_flags_pack_int8x4 when using packed int8x4, this should be always set on GPUs before gfx908.
      • For gfx908 GPUs, unpacked int8 is supported so no need to set this flag.
      • Notice the default flags 0 uses unpacked int8, this somehow changes the behaviour of int8 gemm from ROCm 4.1.0
  • Added a query function rocblas_query_int8_layout_flag to get the preferable layout of int8 for gemm by device

Optimizations

  • Improved performance of single precision copy, swap, and scal when incx == 1 and incy == 1.
  • Improved performance of single precision axpy when incx == 1, incy == 1 and batch_count =< 8192.
  • Improved performance of trmm.

Changed

  • Change cmake_minimum_required to VERSION 3.16.8

Don't miss a new rocBLAS release

NewReleases is sending notifications on new releases.