github ggml-org/llama.cpp b8658

latest release: b8660
3 hours ago
Details

server: save and clear idle slots on new task (--clear-idle) (#20993)

  • server: clear idle slots KV from VRAM (LLAMA_KV_KEEP_ONLY_ACTIVE)

  • server: move idle slot KV clearing to slot release

The save "cost" is now paid by the finishing request.

  • server: add --kv-clear-idle flag, enable by default

  • server: skip clearing last idle slot, clear on launch

  • server: test --no-kv-clear-idle flag

  • server: simplify on-release clearing loop

  • server: remove on-release KV clearing, keep launch-only

  • cont : clean-up

  • tests: update log strings after --clear-idle rename

  • tests: use debug tags instead of log message matching

  • test: fix Windows CI by dropping temp log file unlink


Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.