ggml-org/llama.cpp b7492
on GitHub

latest releases: b7966, b7965, b7964...

one month ago

Details

server: add auto-sleep after N seconds of idle (#18228)

implement sleeping at queue level
implement server-context suspend
add test
add docs
optimization: add fast path
make sure to free llama_init
nits
fix use-after-free
allow /models to be accessed during sleeping, fix use-after-free
don't allow accessing /models during sleep, it is not thread-safe
fix data race on accessing props and model_meta
small clean up
trailing whitespace
rm outdated comments

macOS/iOS:

Linux:

Windows:

openEuler:

Check out latest releases or
releases around ggml-org/llama.cpp b7492

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.

Get notifications