jundot/omlx v0.4.4rc2 on GitHub

This is the final release candidate for 0.4.4. A stable release is planned after a short final test pass.

This release candidate focuses on early MiniMax M3 support, stronger macOS 27 compatibility, safer native MTP batching, and API/cache/memory-guard hardening after 0.4.4rc1.

Highlights

Early support for MiniMax M3 via the upstream mlx-vlm PR. oMLX now tracks the not-yet-merged MiniMax M3 work from Blaizzy/mlx-vlm#1374, originally contributed by @ivanfioravanti, so MiniMax M3 / MiniMax M3 VL can be tried before that support lands upstream. This includes native-text VLM adaptation, MiniMax position handling, sparse-attention left-padding fixes, tool-call marker handling, and related prefix/cache support.
Stronger macOS 27 compatibility. oMLX now uses a macOS memory stats compatibility layer for newer HOST_VM_INFO64 layouts, keeping Memory Guard decisions and admin memory telemetry stable on newer macOS releases. (#1749, #1835)
Safer native MTP batching. Native MTP decode now realigns batch rows before decode and defers late-join rows that are not safe for aligned MTP, preventing unsafe speculative batching in mixed-position decode batches.
Improved Gemma 4 and Harmony tool-call robustness. Gemma 4 now handles namespaced MCP tool names, single-quoted and bounded JSON-like arguments, malformed/deep argument payloads, missing tool descriptions, and safer fallback parsing. Harmony output with malformed channels is preserved as visible text instead of disappearing.
Stronger Memory Guard and hot-cache behavior. oMLX now reserves hot-cache headroom under pressure, avoids hot-cache growth when memory is tight, improves prefill rejection diagnostics, and charges unfused SDPA scratch memory at the model compute dtype to reduce false rejections.
macOS app and integration polish. The menu bar app stays responsive under server load, Codex App Desktop launch is available alongside Codex CLI, and Hermes launches through the correct hermes chat flow.

Improvements and Fixes

Added MiniMax M3 model discovery, native-text VLM support, sparse-attention patching, position-id handling, output parsing, tool-call filtering, and cache/type-handler support. (#1875)
Exposed nested VLM language models through the oQ sanitize-plan proxy so MiniMax-style nested VLMs can be quantized. (#1881)
Fixed native MTP row alignment, sampler/logits-processor alignment, and late-join handling. (#1824, #1845, #1879)
Fixed VLM MTP routing through the adapter for mRoPE position encoding. (#1839)
Fixed Gemma 4 MCP-namespaced tool calls and bounded Gemma 4 JSON argument parsing. (#1854)
Fixed missing tool descriptions in strict chat templates and preserved malformed Harmony channels as visible assistant output. (#1876)
Fixed /v1/completions thinking_budget forwarding and hardening, and honored request max_tokens when force_sampling is enabled. (#1821, #1844, #1857)
Improved TurboQuant prefix-cache restore, hybrid-cache eligibility, sink-aware fallback behavior, and long-prefill handling. (#1842)
Preserved prefix cache for mid-system messages and cached system notes for templates without native mid-system support. (#1826)
Improved oversized prefill rejection before streaming and preserved typed prefill-memory errors through server streaming paths. (#1829)
Added binding-ceiling-aware prefill rejection messages and preserved hot-cache preflight diagnostics. (#1452)
Fixed menu bar responsiveness under server load and improved Codex App / Hermes launch integrations. (#1852, #1878, #1880)
Enabled tool calling on the serial diffusion lane and fixed diffusion benchmark / embedding edge cases. (#1837)
Updated the mlx-lm and mlx-vlm pins for the compatibility fixes used by this release candidate.

Thanks

Special thanks to @ivanfioravanti for the initial MiniMax M3 support PR in mlx-vlm, and to @Blaizzy for the awesome mlx-vlm work that makes this path possible.

Thanks to @richgoodson, @efortin, @cfbraun, @fparrav, @gilby, @jimicze, @isaac-cf-wong, @imi4u36d, @scubamount, and @chenqianhe for the reports and fixes that shaped this release candidate.

Full Changelog: v0.4.4rc1...v0.4.4rc2

jundot/omlx v0.4.4rc2 0.4.4rc2 on GitHub

Highlights

Improvements and Fixes

Thanks

jundot/omlx v0.4.4rc2
0.4.4rc2

on GitHub