Details
server: save and clear idle slots on new task (--clear-idle) (#20993)
-
server: clear idle slots KV from VRAM (LLAMA_KV_KEEP_ONLY_ACTIVE)
-
server: move idle slot KV clearing to slot release
The save "cost" is now paid by the finishing request.
-
server: add --kv-clear-idle flag, enable by default
-
server: skip clearing last idle slot, clear on launch
-
server: test --no-kv-clear-idle flag
-
server: simplify on-release clearing loop
-
server: remove on-release KV clearing, keep launch-only
-
cont : clean-up
-
tests: update log strings after --clear-idle rename
-
tests: use debug tags instead of log message matching
-
test: fix Windows CI by dropping temp log file unlink
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
macOS/iOS:
Linux:
- Ubuntu x64 (CPU)
- Ubuntu arm64 (CPU)
- Ubuntu s390x (CPU)
- Ubuntu x64 (Vulkan)
- Ubuntu arm64 (Vulkan)
- Ubuntu x64 (ROCm 7.2)
- Ubuntu x64 (OpenVINO)
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: