Highlight: Vision-Language Model Support with Tiered Caching
Starting with v0.2.0, oMLX sees the world — not just text.
Vision-language models now run natively on your Mac with the same continuous batching, paged KV cache, and SSD-tiered caching that powers text inference. Combined with production-grade tool calling, your Apple Silicon machine becomes a local inference server that doesn't just demo well — it actually works. Agentic coding, OpenClaw, multi-turn vision chat: real workloads, real performance, no cloud required.
For full v0.2.0 feature details, see v0.2.0 release notes.
New Features (v0.2.2)
Model type override and VLM-to-LLM fallback (#72)
- Added model type override support — manually set a model as LLM or VLM regardless of auto-detection
- VLM models can fall back to LLM mode for text-only workloads
MCP tool auto-injection
- Added automatic MCP tool injection into chat completion requests
- Added MCP config loading from
settings.jsonwithmcpServerskey support
Bug Fixes (v0.2.2)
RGBA image broadcast error
- Fixed crash when loading RGBA images by converting to RGB before processing
MCP tool definition serialization
- Fixed Pydantic
ToolDefinitionnot being converted to dict before MCP merge
Admin dashboard layout
- Fixed repetition penalty label abbreviation and reordered sampling parameter row to top_p / top_k / rep_penalty
Full changelog: v0.2.1...v0.2.2