jundot/omlx v0.2.24 on GitHub

v0.2.24 Release Notes

Fix VLM loading failure on all Qwen3.5 models — transformers 5.4.0 (released March 27) rewrote Qwen2VLImageProcessor from numpy/PIL to torch/torchvision backend, breaking VLM loading in environments without torch. Every Qwen3.5 model failed VLM init and fell back to LLM, causing double model loading and ~2x peak memory usage. Pinned transformers>=5.0.0,<5.4.0. (#431)
Fix IOKit kernel panic (completeMemory prepare count underflow) — Immediate mx.clear_cache() after request completion raced with IOKit's asynchronous reference count cleanup, causing kernel panics on M1/M2/M3 devices. Deferred Metal buffer clearing by 8 generation steps to allow IOKit callbacks to complete. (#435)
Fix swap during model load with memory guard enabled — mx.set_memory_limit() caused MLX to aggressively reclaim cached buffers during model loading, creating alloc/free churn that pushed the system into swap. Removed Metal-level memory limits entirely since all memory protection uses mx.get_active_memory() polling instead. (#429)

Fix GPTQ performance for large MoE models
Fix VLM tokenizer eager loading causing OOM during oQ quantization
Harden error recovery to prevent SIGABRT from secondary Metal errors during cleanup (#429, #435)