What's New
Features
- Skip API key verification (localhost): when the server is bound to localhost, you can now disable API key verification for all API endpoints from global settings. makes local-only workflows frictionless, no more dummy keys needed. the option automatically resets when switching to a public host. (#92)
- Model alias: set a custom API-visible name for any model via the model settings modal.
/v1/modelsreturns the alias instead of the directory name, and requests accept both the alias and the original name. useful when switching between inference providers without reconfiguring clients. (#92) - Version display: the CLI now shows the version in the startup banner, and the admin navbar displays the running version. (#90)
Bug Fixes
- Loaded model lost after re-discovery: deleting a model or changing settings triggered model re-discovery, which dropped already-loaded engines from the pool. loaded models now preserve their runtime state across re-discovery. (#89)
- Text-only VLM quant misdetection: text-only quantizations of natively multimodal models (e.g. Qwen 3.5 122B converted via
mlx_lm.convert) were misdetected as VLM, causing a failed load attempt on every restart. now correctly classified as LLM whenvision_configis absent. (#84) - SSD cache utilization over 100%: cache utilization could exceed 100% when available disk space shrank after initial calculation. now clamped properly.
- Reasoning model output token caching: output tokens from reasoning models (with
<think>tags) were being cached unnecessarily. now skipped to avoid polluting the prefix cache.
UI Improvements
- Model settings modal reordered: alias / model type / ctx window / max tokens / temperature / top p / top k / rep. penalty / ttl / load defaults
- Alias badge shown next to model name in both model settings list and model manager
New Contributors
Thanks to @rsnow for the contribution!