0.4.0rc1 is the first release candidate for the native Swift macOS app. The old PyObjC menubar app has been retired, and the macOS bundle now ships as a Swift app with a redesigned onboarding flow, settings UI, status surfaces, model management, and GitHub Releases based updater.
This Swift transition was driven by excellent work from @popfido, with follow-up polish and release-path fixes folded in after the initial merge. Thank you for the huge amount of thoughtful work here — this is the biggest user-facing macOS change oMLX has shipped so far, and it substantially raises the quality of the desktop app.
Highlights
- Native Swift macOS app. The old PyObjC menubar app has been replaced by a native Swift/SwiftUI app, with new onboarding, settings, status, model management, downloads, integrations, and update flows. by @popfido
- Improved menubar and app status. Live port/status updates, StatusKit fixes, version display, and cleaner running-state behavior. by @popfido
- Browser chat UI received a major usability overhaul and follow-up message/action fixes. by @beamivalice
- xgrammar is bundled into the venvstacks export with the no-torch stub path. by @cfbraun
- Memory guard tuning relaxed throttle/eviction thresholds and improved Custom tier behavior.
Runtime, cache, and scheduler
- Per-engine MLX threads eliminate cross-engine stream contamination. by @ivaniguarans
- Store-cache and boundary snapshot paths now materialize lazy arrays on the owning thread before async byte extraction. by @aeyeopsdev
- Boundary snapshot cleanup races and stale snapshot handling were fixed. by @cfbraun
- Predictive prefill throttling and reclaim/requeue behavior reduce mid-stream OOM failures. by @sdiamanEXUS
- Paged cache references are released correctly on preflight/prefill rejection paths. by @cfbraun
- VLM, SpecPrefill, and draft-model lazy state is materialized on loader threads to avoid stream errors. by @cfbraun
MTP, oQ, TurboQuant, and model compatibility
- Safe row-wise MTP decoding is enabled for aligned batches, with fallback for unsafe late-join batches.
- Qwen3.6 MXFP4 mixed norm conventions and MTP preservation are handled more safely. by @scubamount
- TurboQuant now supports batched KV-cache compression and fixes batch merge edge cases. by @popfido
- DFlash/MTP transition restores Qwen GQA attention hooks.
- LFM text MoE model discovery is classified correctly as LLM instead of mlx-audio STS. by @samfenwick
API and integrations
- Guided grammar is now exposed as a model setting and maps into the existing structured-output grammar path. by @MrNiceRicee
- Anthropic cache-control accounting and model context length reporting were fixed. by @richgoodson
- Claude Code compatibility was updated for newer request behavior. by @lx1229
- CLI shutdown handles
KeyboardInterruptcleanly. by @fry69 - Integration launch context was unified across external tool integrations.
Admin UI and macOS UI
- Downloads now include a model card sheet with metadata, files, and tags. by @popfido
- Local Models sorting is now case-insensitive ascending. by @MwC-Trexx
- Active Models layout works better on narrow screens. by @samfenwick
- Model settings table headers are aligned. by @ilukashin
- Server/app settings apply behavior and live port display were cleaned up. by @popfido
Packaging, CI, and tests
- The venvstacks driver is pinned/detected more reproducibly. by @popfido
- The
mlx-frameworkvenvstacks layer was renamed tomlx-base. by @popfido - CI workflow and broader unit-test coverage were added. by @Mearman, @cfbraun, @fry69
- Python 3.14 was added to the CI matrix. by @fry69
- paroquant dev dependency was bumped to 0.1.15.
New Contributors
Thank you to everyone making their first contribution in this release:
@cfbraun, @chenqianhe, @jcalvert, @MwC-Trexx, @azhangd, @scubamount, @sdiamanEXUS, @ilukashin, @tylerliu, @MrNiceRicee, @lx1229.
