github amd/blis 5.2.2
AOCL 5.2.2 Release

8 hours ago

AOCL-BLAS 5.2.2 Release Notes

Overview

AOCL-BLAS 5.2.2 is an incremental release building on the 5.2 GA release, delivering performance optimizations, bug fixes, improved threading stability, and expanded test coverage.

Performance Optimizations

  • Optimized SGEMM rd kernels on Zen3
  • Improved SGEMM rd kernel on Zen4/Zen5
  • SGEMM tiny path tuning for Zen4 and Zen5
  • Added tiny path for SGEMM
  • Added fast path for single-threaded AVX512 DGEMV kernel
  • Tuned decision logic for DGEMV multithreading for skinny sizes
  • Tuned DGEMV no-transpose thresholds
  • Re-tuned GEMV thresholds
  • Optimal rerouting of GEMV inputs to avoid packing
  • Tuned input threshold for tiny DGEMM interface
  • Tuned ZGEMM threshold for Zen5
  • Changed ZGEMM SUP threshold logic for Zen5 to fix performance regression
  • Replaced intrinsics with inline assembly for bli_saxpyv_zen4_int and bli_saxpyf_zen_int_5
  • Improved fringe case handling for AXPYV kernel
  • Disabled small_gemm for Zen4/Zen5 and added single-thread check for tiny path
  • Disabling GEMV(M1) rerouting in BF16 APIs (AVX512)

Bug Fixes

  • Fixed memory leak in DGEMV kernel
  • Fixed extreme values handling in GEMV
  • Fixed integer division in GEMV that was supposed to be a double operation
  • Fixed Integer Overflow issue in TPSV
  • Fixed out-of-bound access in F32 matrix add/mul ops
  • Bugfix: BF16 to F32 conversion in AVX2 F32 codepath
  • Bug fix in BF16 AVX2 conversion path
  • Fix for F32 to BF16 conversion and AVX512 ISA support checks
  • Fixed cblas_ctrmm invalid diag handling
  • Coverity issue fix for ZTRSM
  • Fixed Coverity static analysis issue in DTRSM
  • Fixed high priority Coverity issues in LPGEMM
  • Fixed Coverity issues with CID: 23269 and CID: 137049
  • Resolved operator precedence warning in Zen5 DCOMPLEX threshold logic
  • Modified AXPY kernel to ensure consistency of numerical results
  • GCC 15 SUP kernel workaround (2)
  • Disabled no post-ops path in LPGEMM F32 kernels for certain GCC versions
  • Updated guards in s8s8s32of32_sym_quant framework

Threading & Stability

  • Fixed data race in native code-path
  • Add OpenMP barrier before releasing threadinfo & global communicator to avoid race
  • Replaced OMP barrier with bli_thread_barrier and added similar fixes
  • Global communicator is now freed outside the parallel region
  • Thread: free global communicator after parallel region completes
  • Initialize mem_t structures safely and handle NULL communicator in threading
  • Fix DTL dynamic thread logging in BLAS operations
  • Added dynamic threads and actual threads in the DTL log of SAXPY
  • Enabled disable-sba-pools feature in AOCL-BLAS

Build System & Infrastructure

  • Updates to the build systems (CMake and Make) for LPGEMM compilation
  • CMake: Adding targets and aliases so that BLIS works with FetchContent
  • Set security flags default enable
  • DTL Windows getpid support
  • Add compiler information to make showconfig and bench_getlibraryInfo
  • Make all bench applications consistent
  • Standardize Zen kernel names

Test Suite (GTestSuite)

  • Added Banded API tests: gbmv, hbmv, sbmv, tbmv, tbsv
  • Added Packed API tests: hpmv, spmv, tpmv, tpsv, hpr, hpr2, spr, spr2
  • Added conjugate dot and ger IIT_ERS tests
  • Added data pool support
  • Moved data generator definitions to a cpp file
  • Computediff improvements
  • Fix in swap
  • Break up tests for better organization
  • Multiple miscellaneous test fixes

Documentation

  • Fixing doc about building bench
  • Add external PR integration process and flowchart to CONTRIBUTING.md
  • Updated LICENSE and NOTICES files for AOCL-5.2 release

Compiler Warnings

  • Multiple rounds of compiler warnings fixes
  • Adding bli_print_msg before bli_abort() for bli_thrinfo_sup_create_for_cntl
  • DTL log updates
  • Code tidying

Don't miss a new blis release

NewReleases is sending notifications on new releases.