Changes in 0.3
-
Default to detecting the CUDA device capabilities at configure
time. If no device is found on the build system, build all "major"
CUDA capabilities to cut down on build time and library size. (thanks
to Jeff Hammond for contributing) -
Add support for mixed memory types (thanks to ParTec AG for
contributing) -
Add HIP backend for stream APIs
-
Add automatic HIP SM detection
-
Add automatic CUDA SM detection
-
Add support for user-specified CUDA compiler
-
Add support in --ze-native option to compile for multiple devices
-
Add support for --pup-max-nesting < 2 in genpup.py
-
Add support for --ze-revision-id to pass to ocloc compiler
-
Other bug fixes and code cleanup