Download the DMG that matches your macOS version (sequoia or tahoe).
If you're on an M5 Mac, you must use themacos26-tahoeDMG for M5 Neural Accelerator.
New Models
- Qwen3-VL embedding and reranking support (via mlx-embeddings 6e2ef52)
- Moondream3 vision-language model support (via mlx-vlm b7f853a)
Bug Fixes
- fix respect per-model
max_tokenssettings (#258) - fix download stall timeout too short (120s → 300s) (#254)
- fix Qwen3.5 batch dimension mismatches under continuous batching (upstream mlx-vlm db3d558)
- fix Qwen3-VL attention mask slicing with mx.array kv_seq_len (upstream mlx-vlm)
- fix Qwen3-Omni integration (upstream mlx-vlm b7f853a)
Dependency Updates
- mlx
>=0.29.2→>=0.31.1 - mlx-embeddings
88522e2→6e2ef52 - mlx-vlm
348466f(0.3.13) →b7f853a(0.4.0)
full changelog: v0.2.13...v0.2.14