github OpenMathLib/OpenBLAS v0.3.29
OpenBLAS 0.3.29 version

18 hours ago

general:

  • fixed a potential NULL pointer dereference in multithreaded builds
  • added function aliases for GEMMT using its new name GEMMTR adopted by Reference-BLAS
  • fixed a build failure when building without LAPACK_DEPRECATED functions
  • the minimum required CMake version for CMake-based builds was raised to 3.16.0 in order
    to remove many compatibility and deprecation warnings
  • added more detailed CMake rules for OpenMP builds (mainly to support recent LLVM)
  • fixed the behavior of the recently added CBLAS_?GEMMT functions with row-major data
  • improved thread scaling of multithreaded SBGEMV
  • improved thread scaling of multithreaded TRTRI
  • fixed compilation of the CBLAS testsuite with gcc14 (and no Fortran compiler)
  • added support for option handling changes in flang-new from LLVM18 onwards
  • added support for recent calling conventions changes in Cray and NVIDIA compilers
  • added support for compilation with the NAG Fortran compiler
  • fixed placement of the -fopenmp flag and libsuffix in the generated pkgconfig file
  • improved the CMakeConfig file generated by the Makefile build
  • fixed const-correctness of cblas_?geadd in cblas.h
  • fixed a potential inaccuracy in multithreaded BLAS3 calls
  • fixed empty implementations of get/set_affinity that print a warning in OpenMP builds
  • fixed function signatures for TRTRS in the converted C version of LAPACK
  • fixed omission of several single-precision LAPACK symbols in the shared library
  • improved build instructions for the provided "pybench" benchmarks
  • improved documentation, including added build instructions for WoA and HarmonyOS
    as well as descriptions of environment variables that affect build and runtime behavior
  • added a separate "make install_tests" target for use with cross-compilations
  • integrated improvements and corrections from Reference-LAPACK:
    • removed a comparison in LAPACKE ?tpmqrt that is always false (LAPACK PR 1062)
    • fixed the leading dimension for B in tests for GGEV (LAPACK PR 1064)
    • replaced the ?LARFT functions with a recursive implementation (LAPACK PR 1080)

arm:

  • fixed build with recent versions of the NDK (missing .type declaration of symbols)

arm64:

  • fixed a long-standing bug in the (generic) c/zgemm_beta kernel that could lead to
    reads and writes outside the array bounds in some circumstances
  • rewrote cpu autodetection to scan all cores and return the highest performing type
  • improved the DGEMM performance for SVE targets and small matrix sizes
  • improved dimension criteria for forwarding from GEMM to GEMV kernels
  • added SVE kernels for ROT and SWAP
  • improved SVE kernels for SGEMV and DGEMV on A64FX and NEOVERSEV1
  • added support for using the "small matrix" kernels with CMake as well
  • fixed compilation on Windows on Arm
  • improved compile-time detection of SVE capability
  • added cpu autodetection and initial support for Apple M4
  • added support for compilation on systems running IOS
  • added support for compilation on NetBSD ("evbarm" architecture)
  • fixed NRM2 implementations for generic SVE targets and the Neoverse N2
  • fixed compilation for SVE-capable targets with the NVIDIA compiler

x86_64:

  • fixed a wrong storage size in the SBGEMV kernel for Cooper Lake
  • added cpu autodetection for Intel Granite Rapids
  • added cpu autodetection for AMD Ryzen 5 series
  • added optimized SOMATCOPY_CT for AVX-capable targets
  • fixed the fallback implementation of GEMM3M in GENERIC builds
  • tentatively re-enabled builds with the EXPRECISION option
  • worked around a miscompilation of tests with mingw32-gfortran14
  • added support for compilation with the Intel oneAPI 2025.0 compiler on Windows

power:

  • fixed multithreaded SBGEMM
  • fixed a CMake build problem on POWER10
  • improved the performance of SGEMV
  • added vectorized implementations of SBGEMV and support for forwarding 1xN SBGEMM to them
  • fixed illegal instructions and potential memory overflow in SGEMM on PPCG4
  • fixed handling of NaN and Inf arguments in SSCAL and DSCAL on PPC440,G4 and 970
  • added improved CGEMM and ZGEMM kernels for POWER10
  • added Makefile logic to remove all optimization flags in DEBUG builds

mips64:

  • fixed compilation with gcc14
  • fixed GEMM parameter selection for the MIPS64_GENERIC target
  • fixed a potential build failure when compiling with OpenMP

loongarch64:

  • fixed compilation for Loongson3 with recent versions of gmake
  • fixed a potential loss of precision in Loongson3A GEMM
  • fixed a potential build failure when compiling with OpenMP
  • added optimized SOMATCOPY for LASX-capable targets
  • introduced a new cpu naming scheme while retaining compatibility
  • added support for cross-compiling Loongarch64 targets with CMake
  • added support for compilation with LLVM

riscv64:

  • removed thread yielding overhead caused by sched_yield
  • replaced some non-standard intrinsics with their official names
  • fixed and sped up the implementations of CGEMM/ZGEMM TCOPY for vector lenghts 128 and 256
  • improved the performance of SNRM2/DNRM2 for RVV1.0 targets
  • added optimized ?OMATCOPY_CN kernels for RVV1.0 targets

md5sums
d7df28656fa28616da028d1e94eab216 OpenBLAS-0.3.29.zip
853a0c5c0747c5943e7ef4bbb793162d OpenBLAS-0.3.29.tar.gz
195aff920ba64329cbd358a54d4fefc5 OpenBLAS-0.3.29_x64.zip
bd44474436a81e1d7ac66a5d38124d24 OpenBLAS-0.3.29_x64_64.zip
483119550c3414cbc44027a6c782f581 OpenBLAS-0.3.29_x86.zip

Download OpenBLAS

Don't miss a new OpenBLAS release

NewReleases is sending notifications on new releases.