Highlights
Qwen 3.5 SSD Caching Support
Qwen 3.5's powerful hybrid architecture (GatedDeltaNet + Attention) is now fully supported with SSD caching. Accelerate multi-turn conversations with persistent cache — experience real Agentic Coding on your Mac with oMLX!
What's New
Features
- Per-block boundary snapshots for hybrid cache models (ArraysCache + KVCache)
- Auto-enlarge block size (256 → 1024) for ArraysCache models to optimize cache performance and reusability
Bug Fixes
- Fix SSD cache not being reused across server restarts
- Fix ArraysCache cache store/restore producing invalid placeholder states in intermediate blocks
- Fix
contentarray not converted to string in assistant+tool_calls path (#42) - Fix PEP 440 non-compliant version string causing
pip install -e .failure - Fix boundary snapshot OOM during long prefills by offloading to SSD (#48)
- Fix
_BoundarySnapshotProvidermissing__len__preventing paged cache storage - Fix shared
SchedulerConfigmutation across models causing incorrect block sizes - Fix noisy
NoneTypedebug log spam for non-hybrid KVCache models
Other
- Add
brew servicessupport (#43) - Add Homebrew upgrade instructions to README
Note
v0.1.9 has been removed due to a critical memory issue (#48) affecting hybrid cache models during long prefills. v0.1.10 includes all v0.1.9 features plus the fixes above.