jundot/omlx v0.2.15 on GitHub

Download the DMG that matches your macOS version (sequoia or tahoe).
If you're on an M5 Mac, you must use the macos26-tahoe DMG for M5 Neural Accelerator.

Bug Fixes

fix boundary cache consistency for hybrid models (Qwen3.5, GatedDeltaNet) - aligned ArraysCache block size to prefill step size, eliminating numerical divergence between cache ON/OFF. also patched missing cache.advance() in mlx-lm's qwen3_5 forward pass (#260)

tip for Qwen3.5 users: if reasoning (enable_thinking) is true, the model may emit EOS during tool calling and stop generation mid-turn. if you're using Qwen3.5 for agentic coding, go to model settings → Chat Template Kwargs, set enable_thinking to false and check force.
fix RotatingKVCache block size now uses window_size multiples (512-1024 range) instead of raw window_size, reducing SSD I/O overhead from many small block files
fix embedding/reranker engines recompiling Metal compute graph on every request after short idle (#266)
fix chat image viewer - replaced window.open() with modal overlay for base64 images, preserve original images when editing messages (#264, #268)
fix anthropic streaming thinking block not properly closed when model stops mid-think
fix abort_request() hanging stream_outputs() forever - now signals consumer with abort error instead of silently removing collector
fix OpenAI client compatibility - standard error format for /v1/* routes, case-insensitive model name resolution, accept stop as string or list
fix "Attempted to free unknown block" warnings when all blocks fail validation during cache reconstruction (#272)

full changelog: v0.2.14...v0.2.15