github ggml-org/llama.cpp b8184

4 hours ago
Details

vulkan: improve partial offloading performance on AMD (#19976)

  • vulkan: fix and enable cpy_tensor_async function

  • use transfer_queue for async transfers on AMD, synchronize with timeline semaphore

  • update offload_op logic

  • fix missing transfer submission

  • disable async transfer queue on AMD GCN

  • revert op batch size change

  • fix cpy_tensor_async checks

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.