OpenMathLib/OpenBLAS v0.3.11 on GitHub

NOTE there appear to be several defects in this version unfortunately - this should not be redistributed or used in a production environment

common:

API change:

    the newly added BFLOAT16 functions were renamed to use the
    letter "B" instead of "H" to avoid potential confusion with
    the IEEE "half precision float" type, i.e. the 0.3.10
    SHGEMM is now SBGEMM and the corresponding build option
    was changed from "BUILD_HALF" to "BUILD_BFLOAT16".

Reduced the default BLAS3_MEM_ALLOC_THRESHOLD (used as an upper
limit for placing temporary arrays on the stack) to be compatible
with a stack size of 1mb (as imposed by the JAVA runtime library)
Added mixed-precision dot function SBDOT and utility functions
shstobf16, shdtobf16, sbf16tos and dbf16tod to convert between
single or double precision float arrays and bfloat16 arrays
Fixed prototypes of LAPACK_?ggsvp and LAPACK_?ggsvd functions
in lapack.h
Fixed underflow and rounding errors in LAPACK SLANV2 and DLANV2
(causing miscalculations in e.g. SHSEQR/DHSEQR, LAPACK issue #263)
Fixed workspace calculation in LAPACK ?GELQ (LAPACK issue #415)
Fixed several bugs in the LAPACK testsuite
Improved performance of TRMM and TRSM for certain problem sizes
Fixed infinite recursions and workspace miscalculations in ReLAPACK
CMAKE builds no longer require pkg-config for creating the .pc file
Makefile builds no longer misread NO_CBLAS=0 or NO_LAPACK=0 as
enabling these options
Fixed detection of gfortran when invoked through an mpi wrapper
Improve thread reinitialization performance with OpenMP after a fork
Added support for building only the subset of the library required
for a particular precision by specifying BUILD_SINGLE, BUILD_DOUBLE
Optional function name prefixes and suffixes are now correctly
reflected in the generated cblas.h
Added CMAKE build support for the LAPACK and multithreading tests

POWER:

Added optimized support for POWER10
Added support for compiling for POWER8 in 32bit mode
Added support for compilation with LLVM/clang
Added support for compilation with NVIDIA/PGI compilers
Fixed building on big-endian POWER8
Fixed miscompilation of ZDOTC by gcc10
Fixed alignment errors in the POWER8 SAXPY kernel
Improved CPU detection on AIX
Supported building with older compilers on POWER9

x86_64:

Added support for Intel Cooperlake
Added autodetection of AMD Renoir/Matisse/Zen3 cpus
Added autodetection of Intel Comet Lake cpus
Reimplemented ?sum, ?dot and daxpy using universal intrinsics
Reset the fpu state before using the fpu on Windows as a workaround
for a problem introduced in Windows 10 build 19041 (a.k.a. SDK 2004)
Fixed potentially undefined behaviour in the dot and gemv_t kernels
Fixed a potential segmentation fault in DYNAMIC_ARCH builds
Fixed building for ZEN with PGI/NVIDIA and AMD AOCC compilers

ARMV7:

Fixed cpu detection on BSD-like systems

ARMV8:

Added preliminary support for Apple Vortex cpus
Added support for the Cavium ThunderX3T110 cpu
Fixed cpu detection on BSD-like systems
Fixed compilation in -std=C18 mode

IBM Z:

Added support for compiling with the clang compiler
Improved GEMM performance on Z14

md5sums:
dd211b73398383a44ebd75fffabd937a OpenBLAS-0.3.11.tar.gz
a76bfee7c125071bce6b24eae5b07468 OpenBLAS-0.3.11.zip
bad36be9fe4fe40372b06d326cfc5a2f OpenBLAS-0.3.11-x64.zip

OpenMathLib/OpenBLAS v0.3.11 OpenBLAS 0.3.11 version on GitHub

NOTE there appear to be several defects in this version unfortunately - this should not be redistributed or used in a production environment

common:

API change:

POWER:

x86_64:

ARMV7:

ARMV8:

IBM Z:

OpenMathLib/OpenBLAS v0.3.11
OpenBLAS 0.3.11 version

on GitHub