OpenMathLib/OpenBLAS v0.3.33 on GitHub

general:

fixed an incorrect cast in the SBGEMM test case that could lead to spurious test failures
fixed an invalid memory access in the converted C version of the CBLAS tests
made the BIGNUMA setting automatic when the number of cores exceeds 256
Imported recent updates from Reference-LAPACK to realign with its upcoming 3.13.0 release:
- Implement ?LARF1F and ?ORM2R (Reference-LAPACK PRs 1019,1020,1196,1257)
- Change loop order in ?GETC2 to improve performance (Reference-LAPACK PR 1023)
- Change WORK array dimension in ?GELQS/?GEQRS (Reference-LAPACK PR 1094)
- Add NaN checks for input matrix A in ?GEEV (Reference-LAPACK PR 1136)
- Fix support for jobu/v in LAPACKE_?GESVDQ_WORK (Reference-LAPACK PRs 1146,1221)
- Fix display of version number in LAPACK testsuite (Reference-LAPACK PR 1149)
- Fix DGGES test seed to avoid bad matrix cases (Reference-LAPACK PR 1187)
- Fix truncation of large WORK array sizes in ZHE (Reference-LAPACK PR 1195)
- Fix overwriting of LDSWORK parameter in ?TRSYL3 (Reference-LAPACK PR 1206)
- Fix overwriting of error states in some EIG tests (Reference-LAPACK PR 1207)
- Remove unused parameter in DORBDB3/ZUNBDB3 (Reference-LAPACK PR 1209)
- Re-enable testing of ?BB and ?GG driver functions (Reference-LAPACK PR 1211)
- Fix workspace size calculation in ?TGSEN (Reference-LAPACK PR 774)
- Fix typos in the EIG DMD tests and initialized the cutoff variable (PR 1212,1228)
- Optimized looping in ?LACPY/?LASCL/?LANTR with fat matrix and UPLO=L (PR 1251)

arm64:

worked around a serious miscompilation of the DDOT kernel by GCC15, affecting
most non-SVE targets, and SVE targets in the case of non-unit array stride)
fixed an accuracy issue in the GEMV kernel for Neoverse V1 and other SVE targets
fixed broken STRMM and SSYMM in DYNAMIC_ARCH builds when running on non-SME hardware
added an optimized SHGEMM kernel for Neoverse N2
fixed DYNAMIC_ARCH builds under Windows on Arm
Added autodetection of Cortex A75/A76 in DYNAMIC_ARCH builds
Added autodetection of Neoverse V3, currently supported through V2 kernels
Re-added support for the "VORTEX" target in DYNAMIC_ARCH builds with DYNAMIC_LIST
Fixed CMake-based builds that use the "Ninja" generator

loongarch64:

fixed a build failure due to missing support for the new half-precision float type
fixed a long-standing bug in asserting 64bit capability in the c_check helper script

x86_64:

added a workaround for miscompilation of the AVX512 GEMM kernels by LLVM on Windows
fixed a build failure in the LAED3 code when compiling with MinGW on Windows
fixed CMake-based compilation with the NVIDIA HPC compiler
Fixed CMake-based builds that use the "Ninja" generator

wasm:

added optimized kernels for STRSM and DTRSM

md5sums:
96c5cd9013013faefc294bc57830c77d OpenBLAS-0.3.33.tar.gz
81637d0ac00b6dab6f88988cc35645af OpenBLAS-0.3.33.zip
153b444945694e1b773d2c5e5d2a31b0 OpenBLAS-0.3.33-x86.zip
93022c391fce5298d0576bd25655774b OpenBLAS-0.3.33-x64.zip
e30aab9cfab15a5e0ed4858399ad885a OpenBLAS-0.3.33-x64-64.zip

OpenMathLib/OpenBLAS v0.3.33 OpenBLAS 0.3.33 version on GitHub

general:

arm64:

loongarch64:

x86_64:

wasm:

OpenMathLib/OpenBLAS v0.3.33
OpenBLAS 0.3.33 version

on GitHub