jundot/omlx v0.1.14 on GitHub

v0.1.14

Manual model load/unload: Interactive status badges in the admin panel. Hover to load or unload models directly. Added POST /admin/api/models/{model_id}/load endpoint. (#60)
Per-model TTL: Auto-unload idle models after a configurable timeout (seconds). Pinned models ignore TTL. (#60)
Streaming usage stats: Support stream_options.include_usage for both /v1/chat/completions and /v1/completions streaming endpoints. Includes oMLX-specific extended timing fields (time_to_first_token, generation_duration, prompt_tokens_per_second, etc.). (#61)
In-app auto-update: macOS menubar app can now check and install updates automatically. (#59)
Admin UI improvements: Direct GB input for all Resource Management settings. Reorganized model settings modal to a consistent 2-column layout.

Fix server becoming permanently unresponsive when memory pressure evicts the only loaded model. Single-model case now aborts active requests instead of evicting, keeping the model loaded for subsequent short-context requests. (#62)
Fix client hang on multi-model eviction by aborting the victim's active requests before unloading, so clients receive an error message instead of a silent connection drop. (#62)
Fix error messages not being delivered to clients on memory pressure abort. Errors are now propagated through SSE streaming as a content delta.
Fix clipboard copy button not working when accessing via LAN IP (non-secure HTTP context) by adding document.execCommand('copy') fallback. (#63)
Fix hot cache not being flushed to SSD on server shutdown.