general:
- when the build-time number of preconfigured threads is exceeded
at runtime (by an external program calling BLAS functions from
a larger number of threads), OpenBLAS will now allocate an
auxiliary control structure for up to 512 additional threads
instead of aborting - added support for Loongson's LoongArch64 cpu architecture
- fixed building OpenBLAS with CMAKE and -DBUILD_BFLOAT16=ON
- added support for building OpenBLAS as a CMAKE subproject
- added support for building for Windows/ARM64 targets with clang
- improved support for building with the IBM xlf compiler
- imported Reference-LAPACK PR 625 (out-of-bounds access in ?LARRV)
- imported Reference-LAPACK PR 597 for testsuite compatibility with
LLVM's libomp
x86_64:
- added SkylakeX S/DGEMM kernels for small problem sizes (MNK<=1000000)
- added optimized SBGEMM for Intel Cooper Lake
- reinstated the performance patch for AVX512 SGEMV_T with a proper fix
- added a workaround for a gcc11 tree-vectorizer bug that caused spurious
failures in the test programs for complex BLAS3 when compiling at -O3
(the default for cmake "release" builds) - added support for runtime cpu count detection under Haiku OS
- worked around a long-standing miscompilation issue of the Haswell DGEMV_T
kernel with gcc that could produce NaN output in some corner cases
POWER:
- improved performance of DASUM on POWER10
ARMV8:
- fixed crashes (use of reserved register x18) on Apple M1 under OSX
- fixed building with gcc releases earlier than 5.1
MIPS:
- fixed building under BSD
MIPS64:
- fixed building under BSD
5cd5df5a1541ad414f5874aaae17730f OpenBLAS-0.3.18.tar.gz
0ebf2e1ddc491f37be26bea4e0d1239a OpenBLAS-0.3.18.zip
b76692df00d0b655d4f14058f6c2e10f OpenBLAS-0.3.18-x64.zip
b421f7c47223c5f228c1fe1c66f3f0e1 OpenBLAS-0.3.18-x86.zip