github OpenMathLib/OpenBLAS v0.3.32
OpenBLAS 0.3.32 version

7 hours ago

general:

  • Moved the preliminary support for a Web Assembly target to its own WASM
    architecture and WASM128_GENERIC target
  • Fixed a potential performance difference between dedicated compilation for
    a target and its representation in DYNAMIC_ARCH builds by making additional
    cpu-specific parameters available to the DYNAMIC_ARCH configuration
  • Fixed the reimplementation of LAPACK ?GESV to conform to the reference (i.e.
    compute the LU factorization even when NRHS is zero)
  • Improved the error message that is displayed when the compile-time allocation
    of memory buffers is exceeded
  • Fixed a problem with non-serialized accesses to parallelized SYRK by concurrent
    callers
  • Fixed an ABI mismatch in the internal version of CDOT/ZDOT used by the C fallback
    versions of the LAPACK source
  • Improved the f_check script for detecting the Fortran compiler to handle embedded
    dashes in path names
  • Fixed several memory access issues in the utests that were detected by Address
    Sanitizer
  • Fixed Makefile errors in cases where only a subset of precision types was selected
  • Fixed missing function errors in Makefile builds without LAPACK or without threads
  • Fixed a syntax error in the benchmarks Makefile
  • Fixed compiler warnings in the CBLAS testsuite
  • Fixed the OpenMP compiler option used with the Intel Ifx compiler
  • Updated the README sections on supported cpus and operating systems, and added
    notes pertaining to JAVA
  • Updated the documentation page for supported BLAS-like extensions
  • included fixes from the Reference-LAPACK project:
    • Improved step length selection in the fallback path of ?LAED4
      (Reference-LAPACK PR 1191)
    • Rounding up of LWORK and removal of redundant type conversions in the GVD
      functions (Reference-LAPACK PR 1202)
    • internal errors were getting ignored in calculation of selected eigenvalues
      (Reference-LAPACK PR 1204)

arm64:

  • Fixed a potential miscompilation of the SDOT/DDOT/DSDOT kernels
  • Fixed DYNAMIC_ARCH compilation with CMake and compilers lacking SVE support
  • Improved the performance of BGEMM and SBGEMM kernels for Neoverse V2
  • Added optimized SSUM and DSUM kernels for Neoverse N1
  • Added preliminary support for Neoverse V3 cpus as NEOVERSEV2
  • Added cpu autodetection of Cortex A725 and X925 cpus
  • Fixed a CMake build problem with flang on Mac OS
  • Fixed build problems with gcc versions 12 and earlier that do not support fp16
  • Fixed compilation of GEMM kernels for VORTEXM4/ARMV9SME without multithreading
  • Fixed the optimized CDOT/ZDOT kernel to compile with LLVM under Windows on Arm
  • Renamed the copy of the DllMain function used in static linking on MS Windows to
    OpenBLASDllMain to avoid symbol name conflicts with other libraries

loongarch64:

  • fixed POTRF returning wrong results on LA464 due to a wrong parameter setting

power:

  • Fixed compilation problems caused by missing support for half-precision floats (FP16)
  • Fixed a potential miscompilation of the POWER10 DGEMV kernel by limiting its optimization
    level
  • Fixed a SCAL issue on PPCG4/PPC970 running Linux
  • Worked around a SCAL issue on PPC970 running FreeBSD by switching to the generic C kernels

riscv64:

  • Optimized the CROT/ZROT kernel for vector length 128 in the non-unit stride path
  • Improved SBGEMM/SHGEMM and related helper functions for type conversion
  • Fixed probing for BFLOAT16 support in DYNAMIC_ARCH cpu detection at runtime

x86_64:

  • Fixed a potential miscompilation (by gcc 15.x) of the AVX512 SGEMM kernel for "small"
    matrix sizes
  • Fixed the SROT and DROT kernels for Haswell to have consistent (FMA) rounding
    in the main loop and tail call
  • Added automatic detection of Intel Arrow Lake H/U, Panther Lake and Jasper Lake
  • Added automatic detection of Intel Emerald Rapids and upcoming cpu models
  • Updated the cache size translation table in the cpu model autodetection code
  • Improved cpu detection fallback to also include Nehalem as a non-AVX option
  • Fixed a Makefile build issue with clang and the SkylakeX SGEMM kernel
  • Renamed the copy of the DllMain function used in static linking on MS Windows to
    OpenBLASDllMain to avoid symbol name conflicts with other libraries

wasm:

  • Added optimized intrinsics kernels for SGEMM and DGEMM as well as DOT, ROT and SUM

md5sums:
c2e1ba0fdf634b44da789a4323df012c OpenBLAS-0.3.32.zip
021eb76c3fc66290b6ce14fa4c1ff3de OpenBLAS-0.3.32.tar.gz

Download OpenBLAS

Don't miss a new OpenBLAS release

NewReleases is sending notifications on new releases.