github jundot/omlx v0.3.1

7 hours ago

Bug Fixes

  • fix TTL expiration unloading models with active in-flight requests — all engine types (LLM, VLM, embedding, reranker, STT, TTS, STS) now report active request count so TTL check skips busy engines (#522)
  • fix VLM mRoPE position state lost during prefill — multi-turn conversations on Qwen2-VL/Qwen2.5-VL could produce degraded output (#531)
  • fix race condition between snapshot writer thread and cleanup
  • fix thinking fallback tool call extraction too greedy — tightened regex to prevent false matches (#484)
  • fix model aliases not resolving in audio endpoints (#525)
  • fix missing mlx-audio optional deps for TTS/STT/STS (#515)
  • fix force_lm benchmark loading failing on VLM-only models (#487)

Improvements

  • make xgrammar optional — auto-detects install method (pip vs uv) and shows correct install command
  • enable faulthandler for native crash diagnostics (#511, #520)
  • re-download notice toggle in HF uploader
  • oQ: update descriptions to reflect current implementation, temporarily disable enhanced quantization UI
  • deps: bump mlx-vlm to 9db27b5

New Contributors

Don't miss a new omlx release

NewReleases is sending notifications on new releases.