Highlight: Vision-Language Model Support with Tiered Caching
Starting with v0.2.0, oMLX sees the world - not just text.
Vision-language models now run natively on your Mac with the same continuous batching, paged KV cache, and SSD-tiered caching that powers text inference. Combined with production-grade tool calling, your Apple Silicon machine becomes a local inference server that doesn't just demo well - it actually works. Agentic coding, OpenClaw, multi-turn vision chat: real workloads, real performance, no cloud required.
For full v0.2.0 feature details, see v0.2.0 release notes.
New Features (v0.2.3)
Option to disable model memory limit
- Added option to disable model memory limit by setting the slider to 0 in the admin dashboard
Bug Fixes (v0.2.3)
Streaming response corruption on keep-alive connections (#80)
- Fixed
TransferEncodingErrorwhen sending a second message in the same Open WebUI conversation over a local connection - Removed duplicate ASGI
receive()consumers that corrupted HTTP keep-alive state - Replaced
BaseHTTPMiddlewarewith a pure ASGI middleware to avoid streaming response pipe interference
VLM batch generation shape mismatch (#79)
- Fixed shape mismatch error during VLM batch generation
Homebrew install failure (#78)
- Fixed brew install by making MCP an optional dependency
SSD cache fallback robustness (#74, #75)
- Fixed block metadata not being rolled back when SSD cache save fails
- Fixed SSD fallback block registration in paged cache
Scheduler cache corruption recovery
- Fixed broadened recovery to also catch
AttributeErrorandValueError
Full changelog: v0.2.2...v0.2.3
New Contributors
Thanks to @lyonsno for the contribution!