github jundot/omlx v0.2.15

latest releases: v0.3.9.dev1, v0.3.8, v0.3.8rc1...
one month ago

Download the DMG that matches your macOS version (sequoia or tahoe).
If you're on an M5 Mac, you must use the macos26-tahoe DMG for M5 Neural Accelerator.

Bug Fixes

  • fix boundary cache consistency for hybrid models (Qwen3.5, GatedDeltaNet) - aligned ArraysCache block size to prefill step size, eliminating numerical divergence between cache ON/OFF. also patched missing cache.advance() in mlx-lm's qwen3_5 forward pass (#260)

    tip for Qwen3.5 users: if reasoning (enable_thinking) is true, the model may emit EOS during tool calling and stop generation mid-turn. if you're using Qwen3.5 for agentic coding, go to model settings → Chat Template Kwargs, set enable_thinking to false and check force.

  • fix RotatingKVCache block size now uses window_size multiples (512-1024 range) instead of raw window_size, reducing SSD I/O overhead from many small block files

  • fix embedding/reranker engines recompiling Metal compute graph on every request after short idle (#266)

  • fix chat image viewer - replaced window.open() with modal overlay for base64 images, preserve original images when editing messages (#264, #268)

  • fix anthropic streaming thinking block not properly closed when model stops mid-think

  • fix abort_request() hanging stream_outputs() forever - now signals consumer with abort error instead of silently removing collector

  • fix OpenAI client compatibility - standard error format for /v1/* routes, case-insensitive model name resolution, accept stop as string or list

  • fix "Attempted to free unknown block" warnings when all blocks fail validation during cache reconstruction (#272)

full changelog: v0.2.14...v0.2.15

Don't miss a new omlx release

NewReleases is sending notifications on new releases.