github ggml-org/llama.cpp b7352

latest releases: b7363, b7362, b7360...
one day ago

Warning

Release Format Update: Linux releases will soon use .tar.gz archives instead of .zip. Please make the necessary changes to your deployment scripts.

server: add presets (config) when using multiple models (#17859)

  • llama-server: recursive GGUF loading

Replace flat directory scan with recursive traversal using
std::filesystem::recursive_directory_iterator. Support for
nested vendor/model layouts (e.g. vendor/model/*.gguf).
Model name now reflects the relative path within --models-dir
instead of just the filename. Aggregate files by parent
directory via std::map before constructing local_model

  • server : router config POC (INI-based per-model settings)

  • server: address review feedback from @aldehir and @ngxson

PEG parser usage improvements:

  • Simplify parser instantiation (remove arena indirection)
  • Optimize grammar usage (ws instead of zero_or_more, remove optional wrapping)
  • Fix last line without newline bug (+ operator instead of <<)
  • Remove redundant end position check

Feature scope:

  • Remove auto-reload feature (will be separate PR per @ngxson)
  • Keep config.ini auto-creation and template generation
  • Preserve per-model customization logic

Co-authored-by: aldehir aldehir@users.noreply.github.com
Co-authored-by: ngxson ngxson@users.noreply.github.com

  • server: adopt aldehir's line-oriented PEG parser

Complete rewrite of INI parser grammar and visitor:

  • Use p.chars(), p.negate(), p.any() instead of p.until()
  • Support end-of-line comments (key=value # comment)
  • Handle EOF without trailing newline correctly
  • Strict identifier validation ([a-zA-Z_][a-zA-Z0-9_.-]*)
  • Simplified visitor (no pending state, no trim needed)
  • Grammar handles whitespace natively via eol rule

Business validation preserved:

  • Reject section names starting with LLAMA_ARG_*
  • Accept only keys starting with LLAMA_ARG_*
  • Require explicit section before key-value pairs

Co-authored-by: aldehir aldehir@users.noreply.github.com

  • server: fix CLI/env duplication in child processes

Children now receive minimal CLI args (executable, model, port, alias)
instead of inheriting all router args. Global settings pass through
LLAMA_ARG_* environment variables only, eliminating duplicate config
warnings.

Fixes: Router args like -ngl, -fa were passed both via CLI and env,
causing 'will be overwritten' warnings on every child spawn

  • add common/preset.cpp

  • fix compile

  • cont

  • allow custom-path models

  • add falsey check

  • server: fix router model discovery and child process spawning

  • Sanitize model names: replace / and \ with _ for display
  • Recursive directory scan with relative path storage
  • Convert relative paths to absolute when spawning children
  • Filter router control args from child processes
  • Refresh args after port assignment for correct port value
  • Fallback preset lookup for compatibility
  • Fix missing argv[0]: store server binary path before base_args parsing
  • Revert "server: fix router model discovery and child process spawning"

This reverts commit e3832b4.

  • clarify about "no-" prefix

  • correct render_args() to include binary path

  • also remove arg LLAMA_ARG_MODELS_PRESET for child

  • add co-author for ini parser code

Co-authored-by: aldehir hello@alde.dev

  • also set LLAMA_ARG_HOST

  • add CHILD_ADDR

  • Remove dead code


Co-authored-by: aldehir aldehir@users.noreply.github.com
Co-authored-by: ngxson ngxson@users.noreply.github.com
Co-authored-by: Xuan Son Nguyen son@huggingface.co
Co-authored-by: aldehir hello@alde.dev

macOS/iOS:

Linux:

Windows:

Don't miss a new llama.cpp release

NewReleases is sending notifications on new releases.