jundot/omlx v0.3.7 on GitHub

Highlights

Model settings preset system

One-click vendor-recommended defaults replace the tedious 20-parameter setup dance. The model settings modal now has a unified [ Preset | Global | Model ] scope toggle, and ships with a curated omlx_preset.json bundle covering Qwen 3.5/6 (r/nr × general/code), Gemma 4, MiniMax M2.7, gpt-oss-120b, GLM-5/5.1, Llama 4, and Mistral Small 4. Values are copied from each vendor's official recommendation.

A refresh button pulls the latest bundle from omlx.ai, so when a new model lands the presets update in place without a server upgrade. Built on the per-model profile + global template data layer shipped in #853 (thanks @sxc562586657).

New Features

Model settings profile/template data layer with atomic JSON persistence and full CRUD HTTP API (#853)
Bundled omlx_preset.json + POST /api/presets/refresh proxy route to fetch updates from omlx.ai
Qwen 3.6+ thinking preserved across turns on both endpoints: auto-set preserve_thinking=True gated on per-model template detection (#856), server-side <think> reconstruction from client-provided reasoning_content / Anthropic thinking blocks (#814), and native message.reasoning_content field path for supporting templates to avoid the whitespace round-trip (#884)
Global idle timeout dropdown (None / 15m / 30m / 1h / 2h / 8h / 24h) in admin Resource Management. Per-model ttl_seconds still wins, pinned models exempt, live-applies without restart (#868)
hot_cache_only toggle to run KV cache entirely in RAM with zero SSD I/O. Closes #605 (#864)
Qwen3-VL reranker and embedding auto-detection; /v1/rerank now accepts {text, image} dicts (URL, base64, local path). Closes #877
StatusKit Auto-Fix for Tahoe 26.x menubar visibility: one-click flips isAllowed in the ControlCenter group container plist + restarts ControlCenter, with atomic write + backup + Full Disk Access deep-link. Bartender-aware conflict dialog also added
One-shot post-launch status-item recreate for the Tahoe registration race, About panel polish, dedicated menubar log at ~/Library/Application Support/oMLX/logs/menubar.log, and omlx diagnose menubar CLI
4 new intelligence benchmarks (BBQ, MathQA, MMLU-Pro, SafetyBench) with per-category UI grouping (Knowledge / Commonsense & Reasoning / Math / Coding / Safety & Alignment). Thanks @michal-stengg (#837)
reasoning_content accepted on OpenAI Message request model; Anthropic thinking blocks preserved on the native tool-calling assistant branch (previously dropped) (#814, #884)
1% granularity on Memory Limit and Cold Cache Limit sliders

Bug Fixes

Fix oQ quantizing some Qwen3.5 DeltaNet weights that should have stayed in higher precision, causing quality regression. 5 regression tests added (#913)
Fix admin model card being unreadable in dark mode. Thanks @miaobuao (#890)
Stop the RotatingKVCache base_size warning from spamming on every Gemma 4 request. It's expected sliding-window behavior, so it's DEBUG-level now. Thanks @fqx (#910)
Fix RuntimeError: There is no Stream(gpu, 0) crash on Qwen3.5-family models. Thanks @ysys143 (#891)
Bump mlx-lm to v0.31.3 and mlx to 0.31.2, picking up upstream fixes for Gemma 4, Mistral, tool parsers, and Metal stream handling (ml-explore/mlx-lm#1090)
Restore per-field color chips and add a "matched preset" pill so the active preset is obvious at a glance (#897)
Fix Qwen3.6 MoE losing its SpecPrefill speedup (11.87 → 25.9 tok/s on 35B-A3B-4bit-DWQ). Gemma 3, Qwen3, EXAONE4, DOTS1, and others also benefit. Thanks @mrtkrcm (#846)
Fix whitespace drift in <think> reconstruction on Qwen 3.6+ templates, improving prefix cache reuse across multi-turn reasoning (#884)
Fix hot_cache_only either crashing with Metal GPU panics or silently disabling caching entirely (#864)
Fix crash on non-mRoPE VLMs (Gemma 4, Pixtral, LLaVA) when the vision-feature cache hit (#881)
Fix Gemma 4 SpecPrefill compatibility, completing @apetersson's initial fix. Closes #668 (#851)
Fix preserve_thinking=True being auto-applied to templates that don't support it, plus 23 new detection tests (#856)
Return 422 instead of crashing with 500 when a client sends tool_calls.arguments in a malformed format. Closes #854
Fix tool calls being silently dropped when the qwen3_coder parser choked on non-Python-literal argument values; falls back to regex now. Closes #882
Fix MiniMax losing parallel tool calls when a single block contained multiple <invoke>s
Fix per-model Serving Stats filter zeroing counters when models were addressed by alias. Closes #875
Fix the "Custom" profile badge appearing when only ttl_seconds differed from defaults. TTL is now treated as an operational setting (#868)
Fix chat composer height not resetting after submit and user markdown overflowing the chat pane. Thanks @chulanpro5 (#887)
Fix menubar visibility false-positives triggered by fullscreen video or slideshows

New Contributors

@mrtkrcm: Qwen3.6 MoE SpecPrefill (#846)
@michal-stengg: intelligence benchmarks (#837)
@RepublicOfKorokke, @fqx: hot_cache_only (#701 / #864), RotatingKVCache log level (#910)
@apetersson: Gemma 4 SpecPrefill initial fix (#851)
@chulanpro5: chat composer fix (#887)
@ysys143: Metal stream executor fix (#891)
@miaobuao: model card dark mode styles (#890)