Download the DMG that matches your macOS version (sequoia or tahoe).
If you're on an M5 Mac, you must use themacos26-tahoeDMG for M5 Neural Accelerator.
Bug Fixes
-
fix boundary cache consistency for hybrid models (Qwen3.5, GatedDeltaNet) - aligned ArraysCache block size to prefill step size, eliminating numerical divergence between cache ON/OFF. also patched missing
cache.advance()in mlx-lm's qwen3_5 forward pass (#260)tip for Qwen3.5 users: if reasoning (
enable_thinking) is true, the model may emit EOS during tool calling and stop generation mid-turn. if you're using Qwen3.5 for agentic coding, go to model settings → Chat Template Kwargs, setenable_thinkingtofalseand checkforce. -
fix RotatingKVCache block size now uses window_size multiples (512-1024 range) instead of raw window_size, reducing SSD I/O overhead from many small block files
-
fix embedding/reranker engines recompiling Metal compute graph on every request after short idle (#266)
-
fix chat image viewer - replaced window.open() with modal overlay for base64 images, preserve original images when editing messages (#264, #268)
-
fix anthropic streaming thinking block not properly closed when model stops mid-think
-
fix abort_request() hanging stream_outputs() forever - now signals consumer with abort error instead of silently removing collector
-
fix OpenAI client compatibility - standard error format for /v1/* routes, case-insensitive model name resolution, accept stop as string or list
-
fix "Attempted to free unknown block" warnings when all blocks fail validation during cache reconstruction (#272)
full changelog: v0.2.14...v0.2.15