ggml-org/llama.cpp b8184
on GitHub

latest releases: b8840, b8839, b8838...

one month ago

Details

vulkan: improve partial offloading performance on AMD (#19976)

vulkan: fix and enable cpy_tensor_async function
use transfer_queue for async transfers on AMD, synchronize with timeline semaphore
update offload_op logic
fix missing transfer submission
disable async transfer queue on AMD GCN
revert op batch size change
fix cpy_tensor_async checks

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b8184

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications