v0.1.14
Features
- Manual model load/unload: Interactive status badges in the admin panel. Hover to load or unload models directly. Added
POST /admin/api/models/{model_id}/loadendpoint. (#60) - Per-model TTL: Auto-unload idle models after a configurable timeout (seconds). Pinned models ignore TTL. (#60)
- Streaming usage stats: Support
stream_options.include_usagefor both/v1/chat/completionsand/v1/completionsstreaming endpoints. Includes oMLX-specific extended timing fields (time_to_first_token,generation_duration,prompt_tokens_per_second, etc.). (#61) - In-app auto-update: macOS menubar app can now check and install updates automatically. (#59)
- Admin UI improvements: Direct GB input for all Resource Management settings. Reorganized model settings modal to a consistent 2-column layout.
Bug Fixes
- Fix server becoming permanently unresponsive when memory pressure evicts the only loaded model. Single-model case now aborts active requests instead of evicting, keeping the model loaded for subsequent short-context requests. (#62)
- Fix client hang on multi-model eviction by aborting the victim's active requests before unloading, so clients receive an error message instead of a silent connection drop. (#62)
- Fix error messages not being delivered to clients on memory pressure abort. Errors are now propagated through SSE streaming as a content delta.
- Fix clipboard copy button not working when accessing via LAN IP (non-secure HTTP context) by adding
document.execCommand('copy')fallback. (#63) - Fix hot cache not being flushed to SSD on server shutdown.