This release supports multi line configuration for cmd
.
This:
models:
"qwen-32b":
cmd: llama-server --host 127.0.0.1 --port 8999 -ngl 99 --flash-attn -sm row --metrics --cache-type-k q8_0 --cache-type-v q8_0 --ctx-size 80000 --model /mnt/models/Qwen2.5-32B-Instruct-Q8_0.gguf
proxy: "http://127.0.0.1:8999"
Can now be written like this:
models:
"qwen-32b":
cmd: >
/mnt/nvme/models/llama-server-66c2c9
--host 127.0.0.1 --port 8999
-ngl 99
--flash-attn
-sm row
--cache-type-k q8_0 --cache-type-v q8_0
--metrics
--ctx-size 80000
--model /mnt/nvme/models/Qwen2.5-32B-Instruct-Q8_0.gguf
proxy: "http://127.0.0.1:8999"