general:
- fixed an incorrect cast in the SBGEMM test case that could lead to spurious test failures
- fixed an invalid memory access in the converted C version of the CBLAS tests
- made the BIGNUMA setting automatic when the number of cores exceeds 256
- Imported recent updates from Reference-LAPACK to realign with its upcoming 3.13.0 release:
- Implement ?LARF1F and ?ORM2R (Reference-LAPACK PRs 1019,1020,1196,1257)
- Change loop order in ?GETC2 to improve performance (Reference-LAPACK PR 1023)
- Change WORK array dimension in ?GELQS/?GEQRS (Reference-LAPACK PR 1094)
- Add NaN checks for input matrix A in ?GEEV (Reference-LAPACK PR 1136)
- Fix support for jobu/v in LAPACKE_?GESVDQ_WORK (Reference-LAPACK PRs 1146,1221)
- Fix display of version number in LAPACK testsuite (Reference-LAPACK PR 1149)
- Fix DGGES test seed to avoid bad matrix cases (Reference-LAPACK PR 1187)
- Fix truncation of large WORK array sizes in ZHE (Reference-LAPACK PR 1195)
- Fix overwriting of LDSWORK parameter in ?TRSYL3 (Reference-LAPACK PR 1206)
- Fix overwriting of error states in some EIG tests (Reference-LAPACK PR 1207)
- Remove unused parameter in DORBDB3/ZUNBDB3 (Reference-LAPACK PR 1209)
- Re-enable testing of ?BB and ?GG driver functions (Reference-LAPACK PR 1211)
- Fix workspace size calculation in ?TGSEN (Reference-LAPACK PR 774)
- Fix typos in the EIG DMD tests and initialized the cutoff variable (PR 1212,1228)
- Optimized looping in ?LACPY/?LASCL/?LANTR with fat matrix and UPLO=L (PR 1251)
arm64:
- worked around a serious miscompilation of the DDOT kernel by GCC15, affecting
most non-SVE targets, and SVE targets in the case of non-unit array stride) - fixed an accuracy issue in the GEMV kernel for Neoverse V1 and other SVE targets
- fixed broken STRMM and SSYMM in DYNAMIC_ARCH builds when running on non-SME hardware
- added an optimized SHGEMM kernel for Neoverse N2
- fixed DYNAMIC_ARCH builds under Windows on Arm
- Added autodetection of Cortex A75/A76 in DYNAMIC_ARCH builds
- Added autodetection of Neoverse V3, currently supported through V2 kernels
- Re-added support for the "VORTEX" target in DYNAMIC_ARCH builds with DYNAMIC_LIST
- Fixed CMake-based builds that use the "Ninja" generator
loongarch64:
- fixed a build failure due to missing support for the new half-precision float type
- fixed a long-standing bug in asserting 64bit capability in the c_check helper script
x86_64:
- added a workaround for miscompilation of the AVX512 GEMM kernels by LLVM on Windows
- fixed a build failure in the LAED3 code when compiling with MinGW on Windows
- fixed CMake-based compilation with the NVIDIA HPC compiler
- Fixed CMake-based builds that use the "Ninja" generator
wasm:
- added optimized kernels for STRSM and DTRSM
md5sums:
96c5cd9013013faefc294bc57830c77d OpenBLAS-0.3.33.tar.gz
81637d0ac00b6dab6f88988cc35645af OpenBLAS-0.3.33.zip
153b444945694e1b773d2c5e5d2a31b0 OpenBLAS-0.3.33-x86.zip
93022c391fce5298d0576bd25655774b OpenBLAS-0.3.33-x64.zip
e30aab9cfab15a5e0ed4858399ad885a OpenBLAS-0.3.33-x64-64.zip