github xiaoyeli/superlu_dist v9.0.0
v9.0.0 release

latest release: v9.1.0
6 months ago

V9.0.0 Release note

The new features include the following:

  1. LU factorization: diagonal factorization, panel factorization, & Schur-complement update
    can all offloaded to GPU
    Environment variables:

    • export SUPERLU_ACC_OFFLOAD=1 (default: enable GPU)
      - export GPU3DVERSION=1 (default; use code in CplusplusFactor/ for all offload )
      - export GPU3DVERSION=0 (only Schur-complement updates are offloaded)
  2. Triangular solve: new 3D communication-avoiding code
    Environment variable:
    export SUPERLU_ACC_SOLVE=0 (default; only on CPU)
    export SUPERLU_ACC_SOLVE=1 (offload to GPU)

    • NOTE: when using multiple GPUs per 2D grid for GPU triangular solve, we use NVSHMEM for fast
      inter-GPU communication. You need to configure NVSHMEM properly.
      For example, on Perlmutter at NERSC, we need the following setup:
      NVSHMEM_HOME="path to nvshmem installation"
      export NVSHMEM_USE_GDRCOPY=1
      export NVSHMEM_MPI_SUPPORT=1
      export MPI_HOME=${MPICH_DIR}
      export NVSHMEM_LIBFABRIC_SUPPORT=1
      export LIBFABRIC_HOME=/opt/cray/libfabric/1.15.2.0
      export LD_LIBRARY_PATH=$NVSHMEM_HOME/lib:$LD_LIBRARY_PATH
      export NVSHMEM_DISABLE_CUDA_VMM=1
      export FI_CXI_OPTIMIZED_MRS=false
      export NVSHMEM_BOOTSTRAP_TWO_STAGE=1
      export NVSHMEM_BOOTSTRAP=MPI
      export NVSHMEM_REMOTE_TRANSPORT=libfabric
  3. Batched interface to solve many independent systems at the same time
    Driver routine: p[d,s,z]gssvx3d_csc_batch.c
    Example program: p[d,s,z]drive3d.c [ -b batchCount ]

  4. Julia interface
    https://github.com/JuliaSparse/SuperLUDIST.jl

Dependencies: the following shows what needs to be defined in CMake build script

  1. Highly recommended:
  • BLAS:
    -DTPL_ENABLE_INTERNAL_BLASLIB=OFF
    -DTPL__BLAS_LIBRARIES=”path to your BLAS library file”
  • ParMETIS:
    -DTPL_PARMETIS_LIBRARIES=ON
    -DTPL_PARMETIS_INCLUDE_DIRS=”path to metis and parmetis header files”
    -DTPL_PARMETIS_LIBRARIES=”path to metis and parmetis library files”
  1. If you use GPU triangular solve, need the following:
  • LAPACK
    -DTPL_ENABLE_LAPACKLIB=ON
    -DTPL_LAPACK_LIBRARIES=”path to lapack library file”
  • NVSHMEM is needed when using multiple GPUs
    -DTPL_ENABLE_NVSHMEM=ON
    -DTPL_NVSHMEM_LIBRARIES=”path to nvshmem files”
  1. If you use batched interface, need MAGMA
    -DTPL_ENABLE_MAGMALIB=ON
    -DTPL_MAGMA_INCLUDE_DIRS=”path to magma header files”
    -DTPL_MAGMA_LIBRARIES=”path to magma library file”

What's Changed

  • Add create large array for broadcast by @SidShi in #157

New Contributors

Full Changelog: v8.2.1...v9.0.0

Don't miss a new superlu_dist release

NewReleases is sending notifications on new releases.