jundot/omlx v0.3.1 on GitHub

fix TTL expiration unloading models with active in-flight requests — all engine types (LLM, VLM, embedding, reranker, STT, TTS, STS) now report active request count so TTL check skips busy engines (#522)
fix VLM mRoPE position state lost during prefill — multi-turn conversations on Qwen2-VL/Qwen2.5-VL could produce degraded output (#531)
fix race condition between snapshot writer thread and cleanup
fix thinking fallback tool call extraction too greedy — tightened regex to prevent false matches (#484)
fix model aliases not resolving in audio endpoints (#525)
fix missing mlx-audio optional deps for TTS/STT/STS (#515)
fix force_lm benchmark loading failing on VLM-only models (#487)

make xgrammar optional — auto-detects install method (pip vs uv) and shows correct install command
enable faulthandler for native crash diagnostics (#511, #520)
re-download notice toggle in HF uploader
oQ: update descriptions to reflect current implementation, temporarily disable enhanced quantization UI
deps: bump mlx-vlm to 9db27b5