github ggml-org/llama.cpp b7492

latest release: b7493
5 hours ago
Details

server: add auto-sleep after N seconds of idle (#18228)

  • implement sleeping at queue level

  • implement server-context suspend

  • add test

  • add docs

  • optimization: add fast path

  • make sure to free llama_init

  • nits

  • fix use-after-free

  • allow /models to be accessed during sleeping, fix use-after-free

  • don't allow accessing /models during sleep, it is not thread-safe

  • fix data race on accessing props and model_meta

  • small clean up

  • trailing whitespace

  • rm outdated comments

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.