github ggml-org/llama.cpp b8480

2 hours ago
Details

CANN: add RoPE cache preload before ACL graph capture (#20747)

ACL graph capture disallows host-to-device memcpy and device memory
malloc/free on the captured stream. Pre-load the RoPE cache before
capture so that:

  • Host-to-device copies and allocations run on the non-captured stream
  • Cache metadata is populated and memory pool is warmed up
  • During capture, only on-device computations are recorded; host-side
    and allocation branches are skipped

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.