Details
sched : reintroduce less synchronizations during split compute (#20793)
-
CUDA: Improve performance via less synchronizations between token (#17795)
-
Adds CPU-to-CUDA copy capability to
ggml_backend_cuda_cpy_tensor_async() -
Adds function to relax sync requirements between input copies on
supported backends (CUDA for now) -
Exchanges synchronous copy with async copy function.
-
Adds macro guards to allow compilation in non-CUDA builds
-
Reworked backend detection in ggml-backend.cpp to avoid linking
conflicts -
Relax requirement of checks in async CUDA copies from backend and buffer type to just buffer type, to avoid linking issues
-
Minor cleanup
-
Makes opt-in to relax use of explicit syncs more general. Backends like
vulkan which require a synchronization between HtoD copies and graph
execution could also adopt this change now. -
Reintroduces stricter check for CPU->CUDA backend async copy via
GGML_DEVICE_TYPE_CPU. -
Corrects initialization of ggml_backend_sync_mode in
ggml_backend_sched_split initialization -
Simplifies synchronizations to adhere to
saaasgpattern. -
Apply suggestion from @ggerganov (src->buffer to buf_src)
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Apply suggestion from @ggerganov (src->buffer to buf_src) v2
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
- Apply suggestions from @JohannesGaessler code review
Co-authored-by: Johannes Gäßler johannesg@5d6.de
-
Adds single-GPU synchronizations to multi-GPU settings to fix hip backend pipeline parallel bugs.
-
Scheduler Hardening: Exclude hip/MUSA from copy_from_host CPU split ->
GPU split optimization -
Scheduler Hardening: Re-adding original additional synchronizations for
non-async backends -
Adds disclaimer to hip/musa exclusion of copy_from_host. Highlights that it is out of
precaution, but that no perf-impact is visible, and that it can be
revisited separately anytime.
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
Co-authored-by: Johannes Gäßler johannesg@5d6.de
macOS/iOS:
- macOS Apple Silicon (arm64)
- macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
- macOS Intel (x64)
- iOS XCFramework
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu s390x (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
- Ubuntu x64 (SYCL FP32)
- Ubuntu x64 (SYCL FP16)
Android:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows arm64 (OpenCL Adreno)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.3 DLLs
- Windows x64 (Vulkan)
- Windows x64 (OpenVINO)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
- DISABLED
- openEuler x86 (310p)
- openEuler x86 (910b, ACL Graph)
- openEuler aarch64 (310p)
- openEuler aarch64 (910b, ACL Graph)
UI: