ggml-org/llama.cpp b8846
on GitHub

latest releases: b9500, b9499, b9498...

one month ago

Details

ggml : reduce CPU overhead in meta backend (#22041)

cache subgraph splits when cgraph is unchanged

Skip per-call subgraph construction in ggml_backend_meta_graph_compute when the same ggml_cgraph is used consecutively.

Assign uid to every sub-graph so that CUDA's fast uid check path hits too.

Address review comments
Keep the scope as is
Rename last_uid and last_n_subgraphs field. Remove last_max_tmp_size field. Refactor code.
Address review comments
Update ggml/src/ggml-backend-meta.cpp

Co-authored-by: Johannes Gäßler johannesg@5d6.de

Update ggml/src/ggml-backend-meta.cpp

Co-authored-by: Johannes Gäßler johannesg@5d6.de

Co-authored-by: Johannes Gäßler johannesg@5d6.de

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b8846

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications