ggml-org/llama.cpp b8646
on GitHub

latest releases: b9222, b9221, b9219...

one month ago

Details

rpc : reuse compute graph buffers (#21299)

Reuse the buffer for the ggml context which is used for creating the
compute graph on the server side. This partially addresses a memory leak
created by the CUDA backend due to using buffer addresses as cache
keys.

ref: #21265
ref: #20315

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b8646

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications