V9.0.0 Release note
The new features include the following:
-
LU factorization: diagonal factorization, panel factorization, & Schur-complement update
can all offloaded to GPU
Environment variables:- export SUPERLU_ACC_OFFLOAD=1 (default: enable GPU)
- export GPU3DVERSION=1 (default; use code in CplusplusFactor/ for all offload )
- export GPU3DVERSION=0 (only Schur-complement updates are offloaded)
- export SUPERLU_ACC_OFFLOAD=1 (default: enable GPU)
-
Triangular solve: new 3D communication-avoiding code
Environment variable:
export SUPERLU_ACC_SOLVE=0 (default; only on CPU)
export SUPERLU_ACC_SOLVE=1 (offload to GPU)- NOTE: when using multiple GPUs per 2D grid for GPU triangular solve, we use NVSHMEM for fast
inter-GPU communication. You need to configure NVSHMEM properly.
For example, on Perlmutter at NERSC, we need the following setup:
NVSHMEM_HOME="path to nvshmem installation"
export NVSHMEM_USE_GDRCOPY=1
export NVSHMEM_MPI_SUPPORT=1
export MPI_HOME=${MPICH_DIR}
export NVSHMEM_LIBFABRIC_SUPPORT=1
export LIBFABRIC_HOME=/opt/cray/libfabric/1.15.2.0
export LD_LIBRARY_PATH=$NVSHMEM_HOME/lib:$LD_LIBRARY_PATH
export NVSHMEM_DISABLE_CUDA_VMM=1
export FI_CXI_OPTIMIZED_MRS=false
export NVSHMEM_BOOTSTRAP_TWO_STAGE=1
export NVSHMEM_BOOTSTRAP=MPI
export NVSHMEM_REMOTE_TRANSPORT=libfabric
- NOTE: when using multiple GPUs per 2D grid for GPU triangular solve, we use NVSHMEM for fast
-
Batched interface to solve many independent systems at the same time
Driver routine: p[d,s,z]gssvx3d_csc_batch.c
Example program: p[d,s,z]drive3d.c [ -b batchCount ] -
Julia interface
https://github.com/JuliaSparse/SuperLUDIST.jl
Dependencies: the following shows what needs to be defined in CMake build script
- Highly recommended:
- BLAS:
-DTPL_ENABLE_INTERNAL_BLASLIB=OFF
-DTPL__BLAS_LIBRARIES=”path to your BLAS library file” - ParMETIS:
-DTPL_PARMETIS_LIBRARIES=ON
-DTPL_PARMETIS_INCLUDE_DIRS=”path to metis and parmetis header files”
-DTPL_PARMETIS_LIBRARIES=”path to metis and parmetis library files”
- If you use GPU triangular solve, need the following:
- LAPACK
-DTPL_ENABLE_LAPACKLIB=ON
-DTPL_LAPACK_LIBRARIES=”path to lapack library file” - NVSHMEM is needed when using multiple GPUs
-DTPL_ENABLE_NVSHMEM=ON
-DTPL_NVSHMEM_LIBRARIES=”path to nvshmem files”
- If you use batched interface, need MAGMA
-DTPL_ENABLE_MAGMALIB=ON
-DTPL_MAGMA_INCLUDE_DIRS=”path to magma header files”
-DTPL_MAGMA_LIBRARIES=”path to magma library file”
What's Changed
New Contributors
Full Changelog: v8.2.1...v9.0.0