jundot/omlx v0.2.1 on GitHub

Highlight: Vision-Language Model Support with Tiered Caching

Starting with v0.2.0, oMLX sees the world — not just text.

Vision-language models now run natively on your Mac with the same continuous batching, paged KV cache, and SSD-tiered caching that powers text inference. Combined with production-grade tool calling, your Apple Silicon machine becomes a local inference server that doesn't just demo well — it actually works. Agentic coding, OpenClaw, multi-turn vision chat: real workloads, real performance, no cloud required.

For full v0.2.0 feature details, see v0.2.0 release notes.

Bug Fixes (v0.2.1)

VLM multi-turn image token mismatch (#69)

Fixed "Image features and image tokens do not match: tokens: 0, features N" error when using VLM with multi-turn conversation history
oMLX now uses content-aware assignment that places image placeholders on whichever user turn actually contains image content, regardless of position

VLM abort crash during prefill

Fixed crash when aborting a VLM request during the prefill phase (batch_generator None check)

Responses API content format support

Added input_text / input_image content type normalization for clients using the OpenAI Responses API format

Full changelog: v0.2.0...v0.2.1