Highlight: Vision-Language Model Support with Tiered Caching
Starting with v0.2.0, oMLX sees the world — not just text.
Vision-language models now run natively on your Mac with the same continuous batching, paged KV cache, and SSD-tiered caching that powers text inference. Combined with production-grade tool calling, your Apple Silicon machine becomes a local inference server that doesn't just demo well — it actually works. Agentic coding, OpenClaw, multi-turn vision chat: real workloads, real performance, no cloud required.
For full v0.2.0 feature details, see v0.2.0 release notes.
Bug Fixes (v0.2.1)
VLM multi-turn image token mismatch (#69)
- Fixed "Image features and image tokens do not match: tokens: 0, features N" error when using VLM with multi-turn conversation history
- oMLX now uses content-aware assignment that places image placeholders on whichever user turn actually contains image content, regardless of position
VLM abort crash during prefill
- Fixed crash when aborting a VLM request during the prefill phase (batch_generator None check)
Responses API content format support
- Added
input_text/input_imagecontent type normalization for clients using the OpenAI Responses API format
Full changelog: v0.2.0...v0.2.1