ggml-org/llama.cpp b9413 on GitHub

Details

CUDA: Check PTX version on host side to guard PDL dispatch (#23530)

CUDA: Check PTX version on host side to guard PDL dispatch

Checking on __CUDA_ARCH_LIST__ alone is insufficient for JIT, as this
variable doesn't differentiate between compiling for say sm_90, sm_90a
or sm_90f (so forward-jittable PTX vs. arch/family-specific PTX).

Thus, one can have a bug when compiling with
DCMAKE_CUDA_ARCHITECTURES="89;90a", where current code would wrongly
dispatch to PDL on sm_90/sm_120 in forward-JIT mode.

This PR fixes this issue by checking cudaFuncAttributes::ptxVersion of
the incoming kernel at runtime. A check on ptxVersion alone is
sufficient, as device-codes will always be >= ptxVersion (and any
violation of this would be a severe bug in CUDA/nvcc), see:
https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-code-code-code

Implement MurmurHash3 mixer for better hash distribution

Magic constants were taken from boost:
https://github.com/boostorg/container_hash/blob/2698b43803c012601e6bb1a6116e83767b97986c/include/boost/container_hash/detail/hash_mix.hpp#L19-L65

Update ggml/src/ggml-cuda/common.cuh

Co-authored-by: Johannes Gäßler johannesg@5d6.de

Address review comments, make seed non-zero
Apply code-formatting
Replace std::size_t -> size_t for consistency

Co-authored-by: Johannes Gäßler johannesg@5d6.de

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI: