ggml-org/llama.cpp b7352 on GitHub

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

server: add presets (config) when using multiple models (#17859)

llama-server: recursive GGUF loading

Replace flat directory scan with recursive traversal using
std::filesystem::recursive_directory_iterator. Support for
nested vendor/model layouts (e.g. vendor/model/*.gguf).
Model name now reflects the relative path within --models-dir
instead of just the filename. Aggregate files by parent
directory via std::map before constructing local_model

server : router config POC (INI-based per-model settings)
server: address review feedback from @aldehir and @ngxson

PEG parser usage improvements:

Simplify parser instantiation (remove arena indirection)
Optimize grammar usage (ws instead of zero_or_more, remove optional wrapping)
Fix last line without newline bug (+ operator instead of <<)
Remove redundant end position check

Feature scope:

Remove auto-reload feature (will be separate PR per @ngxson)
Keep config.ini auto-creation and template generation
Preserve per-model customization logic

Co-authored-by: aldehir aldehir@users.noreply.github.com
Co-authored-by: ngxson ngxson@users.noreply.github.com

server: adopt aldehir's line-oriented PEG parser

Complete rewrite of INI parser grammar and visitor:

Use p.chars(), p.negate(), p.any() instead of p.until()
Support end-of-line comments (key=value # comment)
Handle EOF without trailing newline correctly
Strict identifier validation ([a-zA-Z_][a-zA-Z0-9_.-]*)
Simplified visitor (no pending state, no trim needed)
Grammar handles whitespace natively via eol rule

Business validation preserved:

Reject section names starting with LLAMA_ARG_*
Accept only keys starting with LLAMA_ARG_*
Require explicit section before key-value pairs

Co-authored-by: aldehir aldehir@users.noreply.github.com

server: fix CLI/env duplication in child processes

Children now receive minimal CLI args (executable, model, port, alias)
instead of inheriting all router args. Global settings pass through
LLAMA_ARG_* environment variables only, eliminating duplicate config
warnings.

Fixes: Router args like -ngl, -fa were passed both via CLI and env,
causing 'will be overwritten' warnings on every child spawn

add common/preset.cpp
fix compile
cont
allow custom-path models
add falsey check
server: fix router model discovery and child process spawning

Sanitize model names: replace / and \ with _ for display
Recursive directory scan with relative path storage
Convert relative paths to absolute when spawning children
Filter router control args from child processes
Refresh args after port assignment for correct port value
Fallback preset lookup for compatibility
Fix missing argv[0]: store server binary path before base_args parsing

Revert "server: fix router model discovery and child process spawning"

This reverts commit e3832b4.

clarify about "no-" prefix
correct render_args() to include binary path
also remove arg LLAMA_ARG_MODELS_PRESET for child
add co-author for ini parser code

Co-authored-by: aldehir hello@alde.dev

also set LLAMA_ARG_HOST
add CHILD_ADDR
Remove dead code

Co-authored-by: aldehir aldehir@users.noreply.github.com
Co-authored-by: ngxson ngxson@users.noreply.github.com
Co-authored-by: Xuan Son Nguyen son@huggingface.co
Co-authored-by: aldehir hello@alde.dev

macOS/iOS:

Linux:

Windows: