Details
server: add auto-sleep after N seconds of idle (#18228)
-
implement sleeping at queue level
-
implement server-context suspend
-
add test
-
add docs
-
optimization: add fast path
-
make sure to free llama_init
-
nits
-
fix use-after-free
-
allow /models to be accessed during sleeping, fix use-after-free
-
don't allow accessing /models during sleep, it is not thread-safe
-
fix data race on accessing props and model_meta
-
small clean up
-
trailing whitespace
-
rm outdated comments
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: