github jundot/omlx v0.1.14

latest releases: v0.3.6, v0.3.5, v0.3.5-rc1...
one month ago

v0.1.14

Features

  • Manual model load/unload: Interactive status badges in the admin panel. Hover to load or unload models directly. Added POST /admin/api/models/{model_id}/load endpoint. (#60)
  • Per-model TTL: Auto-unload idle models after a configurable timeout (seconds). Pinned models ignore TTL. (#60)
  • Streaming usage stats: Support stream_options.include_usage for both /v1/chat/completions and /v1/completions streaming endpoints. Includes oMLX-specific extended timing fields (time_to_first_token, generation_duration, prompt_tokens_per_second, etc.). (#61)
  • In-app auto-update: macOS menubar app can now check and install updates automatically. (#59)
  • Admin UI improvements: Direct GB input for all Resource Management settings. Reorganized model settings modal to a consistent 2-column layout.

Bug Fixes

  • Fix server becoming permanently unresponsive when memory pressure evicts the only loaded model. Single-model case now aborts active requests instead of evicting, keeping the model loaded for subsequent short-context requests. (#62)
  • Fix client hang on multi-model eviction by aborting the victim's active requests before unloading, so clients receive an error message instead of a silent connection drop. (#62)
  • Fix error messages not being delivered to clients on memory pressure abort. Errors are now propagated through SSE streaming as a content delta.
  • Fix clipboard copy button not working when accessing via LAN IP (non-secure HTTP context) by adding document.execCommand('copy') fallback. (#63)
  • Fix hot cache not being flushed to SSD on server shutdown.

Don't miss a new omlx release

NewReleases is sending notifications on new releases.