jundot/omlx v0.2.2 on GitHub

Highlight: Vision-Language Model Support with Tiered Caching

Starting with v0.2.0, oMLX sees the world — not just text.

Vision-language models now run natively on your Mac with the same continuous batching, paged KV cache, and SSD-tiered caching that powers text inference. Combined with production-grade tool calling, your Apple Silicon machine becomes a local inference server that doesn't just demo well — it actually works. Agentic coding, OpenClaw, multi-turn vision chat: real workloads, real performance, no cloud required.

For full v0.2.0 feature details, see v0.2.0 release notes.

New Features (v0.2.2)

Model type override and VLM-to-LLM fallback (#72)

Added model type override support — manually set a model as LLM or VLM regardless of auto-detection
VLM models can fall back to LLM mode for text-only workloads

MCP tool auto-injection

Added automatic MCP tool injection into chat completion requests
Added MCP config loading from settings.json with mcpServers key support

Bug Fixes (v0.2.2)

RGBA image broadcast error

Fixed crash when loading RGBA images by converting to RGB before processing

MCP tool definition serialization

Fixed Pydantic ToolDefinition not being converted to dict before MCP merge

Admin dashboard layout

Fixed repetition penalty label abbreviation and reordered sampling parameter row to top_p / top_k / rep_penalty

Full changelog: v0.2.1...v0.2.2