Details
vulkan: improve partial offloading performance on AMD (#19976)
-
vulkan: fix and enable cpy_tensor_async function
-
use transfer_queue for async transfers on AMD, synchronize with timeline semaphore
-
update offload_op logic
-
fix missing transfer submission
-
disable async transfer queue on AMD GCN
-
revert op batch size change
-
fix cpy_tensor_async checks
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: