Download the DMG that matches your macOS version (sequoia or tahoe).
If you're on an M5 Mac, you must use themacos26-tahoeDMG for M5 Neural Accelerator.
Hotfix (v0.2.17)
v0.2.16 has been removed due to this bug. all v0.2.16 changes are included below.
New Models
- Nemotron Super hybrid architecture support —
layers_block_typeconfig, MoE latent projections (upstream mlx-lm 73c8550)
Bug Fixes
-
fix GatedDelta/SSM state precision (Qwen3.5, Qwen3-next, Kimi-Linear) — state buffers now use float32 instead of input dtype for numerical correctness. upstream fix (735a43b)
oMLX produces token-identical output to mlx-lm's BatchGenerator regardless of SSD cache on/off. (verified on KVCache, ArraysCache, ArraysCache MoE models)
-
fix BatchRotatingKVCache lazy evaluation ordering —
left_padding/offsetcould hold stale values due to MLX deferred evaluation. upstream fix (89c430a) addsmx.depends()to enforce correct order -
fix SuScaledRoPE/YarnRoPE mutating input arrays in-place — upstream fix (2146e4e) shallow-copies before modification
-
fix Qwen3-Coder tool parser crashing on Python-style quoted dicts — falls back to
ast.literal_evalwhenjson.loadsfails (upstream ed69f83) -
fix
api_keyexposed in stats response and integration CLI commands (#256) -
fix redundant
/admin/api/logincalls on every stats poll — reuses admin session -
fix sub-block cache (
<block_size) now surfaced in runtime cache observability instead of showing misleading0 indexed blocks(#256)
Dependency Updates
- mlx-lm
4a21ffd(0.31.1) →564281f(0.31.2)
New Contributors
Thanks to @yes999zc for the contributions!
full changelog: v0.2.15...v0.2.17