v9.1.0 Release Note
This includes the following updates:
-
Improved batched interface to solve many independent systems at the same time.
Internally it uses C++ template to support multiple datatypes, e.g., complex.
Please cite this IJHPCA paper when you use the batched functions. -
"SolveOnly" interface: you can input your own LU (or ILU) factored matrices,
but use our parallel, multi-GPU capable sparse triangular solve routine.
This is achieved by setting: options->SolveOnly = YES;
The user still inputs matrix A. Internally, we will treat the lower triangle
of A as the L factor, and upper triangle (including diagonal) of A as the U factor.
See an example program EXAMPLE/pddrive3d.c -
Python interface, currently only support double precision.
See PYTHON/README -
Fix memory leaks in the 3D multi-GPU routines in SRC/CplusplusFactor/
What's Changed
- Fix the sizeof and add casting to trf3d partition structs by @abagusetty in #162
- Fix memory error when using parallel symbolic factorization (ParMETIS) by @sebastiangrimberg in #164
- Avoid cuda device compiling step when linking against the library. by @eromero-vlc in #170
New Contributors
- @abagusetty made their first contribution in #162
- @sebastiangrimberg made their first contribution in #164
- @eromero-vlc made their first contribution in #170
Full Changelog: v9.0.0...v9.1.0