Warning
Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.
server: add presets (config) when using multiple models (#17859)
- llama-server: recursive GGUF loading
Replace flat directory scan with recursive traversal using
std::filesystem::recursive_directory_iterator. Support for
nested vendor/model layouts (e.g. vendor/model/*.gguf).
Model name now reflects the relative path within --models-dir
instead of just the filename. Aggregate files by parent
directory via std::map before constructing local_model
-
server : router config POC (INI-based per-model settings)
PEG parser usage improvements:
- Simplify parser instantiation (remove arena indirection)
- Optimize grammar usage (ws instead of zero_or_more, remove optional wrapping)
- Fix last line without newline bug (+ operator instead of <<)
- Remove redundant end position check
Feature scope:
- Remove auto-reload feature (will be separate PR per @ngxson)
- Keep config.ini auto-creation and template generation
- Preserve per-model customization logic
Co-authored-by: aldehir aldehir@users.noreply.github.com
Co-authored-by: ngxson ngxson@users.noreply.github.com
- server: adopt aldehir's line-oriented PEG parser
Complete rewrite of INI parser grammar and visitor:
- Use p.chars(), p.negate(), p.any() instead of p.until()
- Support end-of-line comments (key=value # comment)
- Handle EOF without trailing newline correctly
- Strict identifier validation ([a-zA-Z_][a-zA-Z0-9_.-]*)
- Simplified visitor (no pending state, no trim needed)
- Grammar handles whitespace natively via eol rule
Business validation preserved:
- Reject section names starting with LLAMA_ARG_*
- Accept only keys starting with LLAMA_ARG_*
- Require explicit section before key-value pairs
Co-authored-by: aldehir aldehir@users.noreply.github.com
- server: fix CLI/env duplication in child processes
Children now receive minimal CLI args (executable, model, port, alias)
instead of inheriting all router args. Global settings pass through
LLAMA_ARG_* environment variables only, eliminating duplicate config
warnings.
Fixes: Router args like -ngl, -fa were passed both via CLI and env,
causing 'will be overwritten' warnings on every child spawn
-
add common/preset.cpp
-
fix compile
-
cont
-
allow custom-path models
-
add falsey check
-
server: fix router model discovery and child process spawning
- Sanitize model names: replace / and \ with _ for display
- Recursive directory scan with relative path storage
- Convert relative paths to absolute when spawning children
- Filter router control args from child processes
- Refresh args after port assignment for correct port value
- Fallback preset lookup for compatibility
- Fix missing argv[0]: store server binary path before base_args parsing
- Revert "server: fix router model discovery and child process spawning"
This reverts commit e3832b4.
-
clarify about "no-" prefix
-
correct render_args() to include binary path
-
also remove arg LLAMA_ARG_MODELS_PRESET for child
-
add co-author for ini parser code
Co-authored-by: aldehir hello@alde.dev
-
also set LLAMA_ARG_HOST
-
add CHILD_ADDR
-
Remove dead code
Co-authored-by: aldehir aldehir@users.noreply.github.com
Co-authored-by: ngxson ngxson@users.noreply.github.com
Co-authored-by: Xuan Son Nguyen son@huggingface.co
Co-authored-by: aldehir hello@alde.dev
macOS/iOS:
Linux:
Windows: