jundot/omlx v0.4.0rc1 on GitHub

0.4.0rc1 is the first release candidate for the native Swift macOS app. The old PyObjC menubar app has been retired, and the macOS bundle now ships as a Swift app with a redesigned onboarding flow, settings UI, status surfaces, model management, and GitHub Releases based updater.

This Swift transition was driven by excellent work from @popfido, with follow-up polish and release-path fixes folded in after the initial merge. Thank you for the huge amount of thoughtful work here — this is the biggest user-facing macOS change oMLX has shipped so far, and it substantially raises the quality of the desktop app.

Highlights

Native Swift macOS app. The old PyObjC menubar app has been replaced by a native Swift/SwiftUI app, with new onboarding, settings, status, model management, downloads, integrations, and update flows. by @popfido
Improved menubar and app status. Live port/status updates, StatusKit fixes, version display, and cleaner running-state behavior. by @popfido
Browser chat UI received a major usability overhaul and follow-up message/action fixes. by @beamivalice
xgrammar is bundled into the venvstacks export with the no-torch stub path. by @cfbraun
Memory guard tuning relaxed throttle/eviction thresholds and improved Custom tier behavior.

Runtime, cache, and scheduler

Per-engine MLX threads eliminate cross-engine stream contamination. by @ivaniguarans
Store-cache and boundary snapshot paths now materialize lazy arrays on the owning thread before async byte extraction. by @aeyeopsdev
Boundary snapshot cleanup races and stale snapshot handling were fixed. by @cfbraun
Predictive prefill throttling and reclaim/requeue behavior reduce mid-stream OOM failures. by @sdiamanEXUS
Paged cache references are released correctly on preflight/prefill rejection paths. by @cfbraun
VLM, SpecPrefill, and draft-model lazy state is materialized on loader threads to avoid stream errors. by @cfbraun

MTP, oQ, TurboQuant, and model compatibility

Safe row-wise MTP decoding is enabled for aligned batches, with fallback for unsafe late-join batches.
Qwen3.6 MXFP4 mixed norm conventions and MTP preservation are handled more safely. by @scubamount
TurboQuant now supports batched KV-cache compression and fixes batch merge edge cases. by @popfido
DFlash/MTP transition restores Qwen GQA attention hooks.
LFM text MoE model discovery is classified correctly as LLM instead of mlx-audio STS. by @samfenwick

API and integrations

Guided grammar is now exposed as a model setting and maps into the existing structured-output grammar path. by @MrNiceRicee
Anthropic cache-control accounting and model context length reporting were fixed. by @richgoodson
Claude Code compatibility was updated for newer request behavior. by @lx1229
CLI shutdown handles KeyboardInterrupt cleanly. by @fry69
Integration launch context was unified across external tool integrations.

Admin UI and macOS UI

Downloads now include a model card sheet with metadata, files, and tags. by @popfido
Local Models sorting is now case-insensitive ascending. by @MwC-Trexx
Active Models layout works better on narrow screens. by @samfenwick
Model settings table headers are aligned. by @ilukashin
Server/app settings apply behavior and live port display were cleaned up. by @popfido

Packaging, CI, and tests

The venvstacks driver is pinned/detected more reproducibly. by @popfido
The mlx-framework venvstacks layer was renamed to mlx-base. by @popfido
CI workflow and broader unit-test coverage were added. by @Mearman, @cfbraun, @fry69
Python 3.14 was added to the CI matrix. by @fry69
paroquant dev dependency was bumped to 0.1.15.

New Contributors

Thank you to everyone making their first contribution in this release:

@cfbraun, @chenqianhe, @jcalvert, @MwC-Trexx, @azhangd, @scubamount, @sdiamanEXUS, @ilukashin, @tylerliu, @MrNiceRicee, @lx1229.