github jundot/omlx v0.3.7

4 hours ago

Highlights

Model settings preset system

One-click vendor-recommended defaults replace the tedious 20-parameter setup dance. The model settings modal now has a unified [ Preset | Global | Model ] scope toggle, and ships with a curated omlx_preset.json bundle covering Qwen 3.5/6 (r/nr × general/code), Gemma 4, MiniMax M2.7, gpt-oss-120b, GLM-5/5.1, Llama 4, and Mistral Small 4. Values are copied from each vendor's official recommendation.

A refresh button pulls the latest bundle from omlx.ai, so when a new model lands the presets update in place without a server upgrade. Built on the per-model profile + global template data layer shipped in #853 (thanks @sxc562586657).

Model settings preset system

New Features

  • Model settings profile/template data layer with atomic JSON persistence and full CRUD HTTP API (#853)
  • Bundled omlx_preset.json + POST /api/presets/refresh proxy route to fetch updates from omlx.ai
  • Qwen 3.6+ thinking preserved across turns on both endpoints: auto-set preserve_thinking=True gated on per-model template detection (#856), server-side <think> reconstruction from client-provided reasoning_content / Anthropic thinking blocks (#814), and native message.reasoning_content field path for supporting templates to avoid the whitespace round-trip (#884)
  • Global idle timeout dropdown (None / 15m / 30m / 1h / 2h / 8h / 24h) in admin Resource Management. Per-model ttl_seconds still wins, pinned models exempt, live-applies without restart (#868)
  • hot_cache_only toggle to run KV cache entirely in RAM with zero SSD I/O. Closes #605 (#864)
  • Qwen3-VL reranker and embedding auto-detection; /v1/rerank now accepts {text, image} dicts (URL, base64, local path). Closes #877
  • StatusKit Auto-Fix for Tahoe 26.x menubar visibility: one-click flips isAllowed in the ControlCenter group container plist + restarts ControlCenter, with atomic write + backup + Full Disk Access deep-link. Bartender-aware conflict dialog also added
  • One-shot post-launch status-item recreate for the Tahoe registration race, About panel polish, dedicated menubar log at ~/Library/Application Support/oMLX/logs/menubar.log, and omlx diagnose menubar CLI
  • 4 new intelligence benchmarks (BBQ, MathQA, MMLU-Pro, SafetyBench) with per-category UI grouping (Knowledge / Commonsense & Reasoning / Math / Coding / Safety & Alignment). Thanks @michal-stengg (#837)
  • reasoning_content accepted on OpenAI Message request model; Anthropic thinking blocks preserved on the native tool-calling assistant branch (previously dropped) (#814, #884)
  • 1% granularity on Memory Limit and Cold Cache Limit sliders

Bug Fixes

  • Fix oQ quantizing some Qwen3.5 DeltaNet weights that should have stayed in higher precision, causing quality regression. 5 regression tests added (#913)
  • Fix admin model card being unreadable in dark mode. Thanks @miaobuao (#890)
  • Stop the RotatingKVCache base_size warning from spamming on every Gemma 4 request. It's expected sliding-window behavior, so it's DEBUG-level now. Thanks @fqx (#910)
  • Fix RuntimeError: There is no Stream(gpu, 0) crash on Qwen3.5-family models. Thanks @ysys143 (#891)
  • Bump mlx-lm to v0.31.3 and mlx to 0.31.2, picking up upstream fixes for Gemma 4, Mistral, tool parsers, and Metal stream handling (ml-explore/mlx-lm#1090)
  • Restore per-field color chips and add a "matched preset" pill so the active preset is obvious at a glance (#897)
  • Fix Qwen3.6 MoE losing its SpecPrefill speedup (11.87 → 25.9 tok/s on 35B-A3B-4bit-DWQ). Gemma 3, Qwen3, EXAONE4, DOTS1, and others also benefit. Thanks @mrtkrcm (#846)
  • Fix whitespace drift in <think> reconstruction on Qwen 3.6+ templates, improving prefix cache reuse across multi-turn reasoning (#884)
  • Fix hot_cache_only either crashing with Metal GPU panics or silently disabling caching entirely (#864)
  • Fix crash on non-mRoPE VLMs (Gemma 4, Pixtral, LLaVA) when the vision-feature cache hit (#881)
  • Fix Gemma 4 SpecPrefill compatibility, completing @apetersson's initial fix. Closes #668 (#851)
  • Fix preserve_thinking=True being auto-applied to templates that don't support it, plus 23 new detection tests (#856)
  • Return 422 instead of crashing with 500 when a client sends tool_calls.arguments in a malformed format. Closes #854
  • Fix tool calls being silently dropped when the qwen3_coder parser choked on non-Python-literal argument values; falls back to regex now. Closes #882
  • Fix MiniMax losing parallel tool calls when a single block contained multiple <invoke>s
  • Fix per-model Serving Stats filter zeroing counters when models were addressed by alias. Closes #875
  • Fix the "Custom" profile badge appearing when only ttl_seconds differed from defaults. TTL is now treated as an operational setting (#868)
  • Fix chat composer height not resetting after submit and user markdown overflowing the chat pane. Thanks @chulanpro5 (#887)
  • Fix menubar visibility false-positives triggered by fullscreen video or slideshows

New Contributors

Don't miss a new omlx release

NewReleases is sending notifications on new releases.