github jundot/omlx v0.2.24

6 hours ago

v0.2.24 Release Notes

Critical Bug Fixes

  • Fix VLM loading failure on all Qwen3.5 modelstransformers 5.4.0 (released March 27) rewrote Qwen2VLImageProcessor from numpy/PIL to torch/torchvision backend, breaking VLM loading in environments without torch. Every Qwen3.5 model failed VLM init and fell back to LLM, causing double model loading and ~2x peak memory usage. Pinned transformers>=5.0.0,<5.4.0. (#431)
  • Fix IOKit kernel panic (completeMemory prepare count underflow) — Immediate mx.clear_cache() after request completion raced with IOKit's asynchronous reference count cleanup, causing kernel panics on M1/M2/M3 devices. Deferred Metal buffer clearing by 8 generation steps to allow IOKit callbacks to complete. (#435)
  • Fix swap during model load with memory guard enabledmx.set_memory_limit() caused MLX to aggressively reclaim cached buffers during model loading, creating alloc/free churn that pushed the system into swap. Removed Metal-level memory limits entirely since all memory protection uses mx.get_active_memory() polling instead. (#429)

Bug Fixes

  • Fix GPTQ performance for large MoE models
  • Fix VLM tokenizer eager loading causing OOM during oQ quantization
  • Harden error recovery to prevent SIGABRT from secondary Metal errors during cleanup (#429, #435)

Don't miss a new omlx release

NewReleases is sending notifications on new releases.