v0.2.24 Release Notes
Critical Bug Fixes
- Fix VLM loading failure on all Qwen3.5 models —
transformers5.4.0 (released March 27) rewroteQwen2VLImageProcessorfrom numpy/PIL to torch/torchvision backend, breaking VLM loading in environments without torch. Every Qwen3.5 model failed VLM init and fell back to LLM, causing double model loading and ~2x peak memory usage. Pinnedtransformers>=5.0.0,<5.4.0. (#431) - Fix IOKit kernel panic (completeMemory prepare count underflow) — Immediate
mx.clear_cache()after request completion raced with IOKit's asynchronous reference count cleanup, causing kernel panics on M1/M2/M3 devices. Deferred Metal buffer clearing by 8 generation steps to allow IOKit callbacks to complete. (#435) - Fix swap during model load with memory guard enabled —
mx.set_memory_limit()caused MLX to aggressively reclaim cached buffers during model loading, creating alloc/free churn that pushed the system into swap. Removed Metal-level memory limits entirely since all memory protection usesmx.get_active_memory()polling instead. (#429)