jundot/omlx v0.1.10 on GitHub

Highlights

Qwen 3.5 SSD Caching Support

Qwen 3.5's powerful hybrid architecture (GatedDeltaNet + Attention) is now fully supported with SSD caching. Accelerate multi-turn conversations with persistent cache — experience real Agentic Coding on your Mac with oMLX!

What's New

Features

Per-block boundary snapshots for hybrid cache models (ArraysCache + KVCache)
Auto-enlarge block size (256 → 1024) for ArraysCache models to optimize cache performance and reusability

Bug Fixes

Fix SSD cache not being reused across server restarts
Fix ArraysCache cache store/restore producing invalid placeholder states in intermediate blocks
Fix content array not converted to string in assistant+tool_calls path (#42)
Fix PEP 440 non-compliant version string causing pip install -e . failure
Fix boundary snapshot OOM during long prefills by offloading to SSD (#48)
Fix _BoundarySnapshotProvider missing __len__ preventing paged cache storage
Fix shared SchedulerConfig mutation across models causing incorrect block sizes
Fix noisy NoneType debug log spam for non-hybrid KVCache models

Other

Add brew services support (#43)
Add Homebrew upgrade instructions to README

Note

v0.1.9 has been removed due to a critical memory issue (#48) affecting hybrid cache models during long prefills. v0.1.10 includes all v0.1.9 features plus the fixes above.

jundot/omlx v0.1.10 oMLX v0.1.10 on GitHub