0.4.0 is the first official release of the native Swift macOS app. The old PyObjC menubar app has been retired, and the macOS bundle now ships as a Swift app with a redesigned onboarding flow, settings UI, status surfaces, model management, and GitHub Releases based updater.
This Swift transition was driven by excellent work from @popfido, with follow-up polish and release-path fixes folded in after the initial merge. Thank you for the huge amount of thoughtful work here — this is the biggest user-facing macOS change oMLX has shipped so far, and it substantially raises the quality of the desktop app.
Highlights
- Native Swift macOS app. The old PyObjC menubar app has been replaced by a native Swift/SwiftUI app, with new onboarding, settings, status, model management, downloads, integrations, and update flows. by @popfido
- Improved menubar and app status. Live port/status updates, StatusKit fixes, version display, supervised-server handling, and cleaner running-state behavior. by @popfido
- Standard Hugging Face cache model directory support. oMLX can now discover models from the standard Hugging Face cache location, with controls for toggling HF cache discovery and managing local model directories.
- Safer update flow. App updates now honor the selected update channel and require confirmation before download.
- Browser chat UI received a major usability overhaul and follow-up message/action fixes. by @beamivalice
- xgrammar is bundled into the venvstacks export with the no-torch stub path. by @cfbraun
- Memory guard tuning relaxed throttle/eviction thresholds, improved Custom tier behavior, and added CLI options for memory guard configuration.
Runtime, cache, and scheduler
- Per-engine MLX threads eliminate cross-engine stream contamination. by @ivaniguarans
- Store-cache and boundary snapshot paths now materialize lazy arrays on the owning thread before async byte extraction. by @aeyeopsdev
- Boundary snapshot cleanup races and stale snapshot handling were fixed. by @cfbraun
- Predictive prefill throttling and reclaim/requeue behavior reduce mid-stream OOM failures. by @sdiamanEXUS
- Paged cache references are released correctly on preflight/prefill rejection paths. by @cfbraun
- Paged cache now disables itself cleanly when SSD initialization fails instead of breaking startup. by @lvsijian8
- VLM, SpecPrefill, and draft-model lazy state is materialized on loader threads to avoid stream errors. by @cfbraun
- Engine stop now yields back to the event loop so shutdown/restart paths do not monopolize the loop. by @fqx
- Unreadable model directories are handled during startup instead of aborting discovery.
- DMG builds now preserve engine commit metadata.
MTP, oQ, TurboQuant, and model compatibility
- Safe row-wise MTP decoding is enabled for aligned batches, with fallback for unsafe late-join batches.
- Qwen3.6 MXFP4 mixed norm conventions and MTP preservation are handled more safely. by @scubamount
- TurboQuant now supports batched KV-cache compression and fixes batch merge edge cases. by @popfido
- DFlash/MTP transition restores Qwen GQA attention hooks.
- LFM text MoE model discovery is classified correctly as LLM instead of mlx-audio STS. by @samfenwick
- Step 3.7 Flash support is patched through the mlx-lm compatibility path.
API and integrations
- Guided grammar is now exposed as a model setting and maps into the existing structured-output grammar path. by @MrNiceRicee
- Anthropic cache-control accounting and model context length reporting were fixed. by @richgoodson
tool_choice: "none"is respected for MCP tools. by @lvsijian8- Tool call function names are trimmed while preserving type validation behavior. by @palvaleri
- Wildcard bind addresses such as
0.0.0.0are normalized to usable local client addresses. by @monroewilliams - Top-level
omlximports are lazy-loaded to improve startup compatibility, including NumPy 2.x environments. by @fparrav - Claude Code compatibility was updated for newer request behavior. by @lx1229
- CLI shutdown handles
KeyboardInterruptcleanly. by @fry69 - Integration launch context was unified across external tool integrations.
Admin UI and macOS UI
- Downloads now include a model card sheet with metadata, files, and tags. by @popfido
- Local Models sorting is now case-insensitive ascending. by @MwC-Trexx
- SwiftUI model lists now also sort case-insensitively.
- Active Models layout works better on narrow screens. by @samfenwick
- Model settings table headers are aligned. by @ilukashin
- Server/app settings apply behavior and live port display were cleaned up. by @popfido
- Light mode settings contrast was restored.
- Mac app CLI launch shim and CLI wrapper signing were restored.
- Admin custom-tier memory text is synced with server behavior.
Packaging, CI, and tests
- The venvstacks driver is pinned/detected more reproducibly. by @popfido
- The
mlx-frameworkvenvstacks layer was renamed tomlx-base. by @popfido - CI workflow and broader unit-test coverage were added. by @Mearman, @cfbraun, @fry69
- Python 3.14 was added to the CI matrix. by @fry69
- Formula automation and release URL substitution were corrected.
- paroquant dev dependency was bumped to 0.1.15.
New Contributors
Thank you to everyone making their first contribution in this release:
@cfbraun, @chenqianhe, @jcalvert, @MwC-Trexx, @azhangd, @scubamount, @sdiamanEXUS, @ilukashin, @tylerliu, @MrNiceRicee, @lx1229, @palvaleri, @monroewilliams.
