github ggml-org/llama.cpp b9413

3 hours ago
Details

CUDA: Check PTX version on host side to guard PDL dispatch (#23530)

  • CUDA: Check PTX version on host side to guard PDL dispatch

Checking on __CUDA_ARCH_LIST__ alone is insufficient for JIT, as this
variable doesn't differentiate between compiling for say sm_90, sm_90a
or sm_90f (so forward-jittable PTX vs. arch/family-specific PTX).

Thus, one can have a bug when compiling with
DCMAKE_CUDA_ARCHITECTURES="89;90a", where current code would wrongly
dispatch to PDL on sm_90/sm_120 in forward-JIT mode.

This PR fixes this issue by checking cudaFuncAttributes::ptxVersion of
the incoming kernel at runtime. A check on ptxVersion alone is
sufficient, as device-codes will always be >= ptxVersion (and any
violation of this would be a severe bug in CUDA/nvcc), see:
https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-code-code-code

  • Implement MurmurHash3 mixer for better hash distribution

Magic constants were taken from boost:
https://github.com/boostorg/container_hash/blob/2698b43803c012601e6bb1a6116e83767b97986c/include/boost/container_hash/detail/hash_mix.hpp#L19-L65

  • Update ggml/src/ggml-cuda/common.cuh

Co-authored-by: Johannes Gäßler johannesg@5d6.de

  • Address review comments, make seed non-zero

  • Apply code-formatting

  • Replace std::size_t -> size_t for consistency


Co-authored-by: Johannes Gäßler johannesg@5d6.de

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.