jundot/omlx v0.2.0 on GitHub

Highlight: Vision-Language Model Support with Tiered Caching

Starting with v0.2.0, oMLX sees the world — not just text.

Vision-language models now run natively on your Mac with the same continuous batching, paged KV cache, and SSD-tiered caching that powers text inference. Combined with production-grade tool calling, your Apple Silicon machine becomes a local inference server that doesn't just demo well — it actually works. Agentic coding, OpenClaw, multi-turn vision chat: real workloads, real performance, no cloud required.

What's New

VLM Engine (`omlx/engine/vlm.py`, `omlx/models/vlm.py`)

Vision-Language Model engine via mlx-vlm integration for vision encoding + mlx-lm BatchGenerator for inference
VLMModelAdapter wrapping VLM's language_model for full BatchGenerator compatibility
Batched VLM prefill with per-UID embeddings in _BoundarySnapshotBatchGenerator
Chunked prefill support with embedding offset tracking for large vision inputs
Prefix cache and paged cache support for VLM requests (vision context reuse)

Image Processing (`omlx/utils/image.py`)

Image input support: base64 data URIs, HTTP/HTTPS URLs, local file paths
Multi-image chat for supported models (Qwen2.5-VL, GLM-4V, etc.)
SHA256 image hashing for prefix cache deduplication
Anthropic API vision support: base64 image_url conversion for /v1/messages

OCR Models

Auto-prompts for DeepSeek-OCR, DOTS-OCR, GLM-OCR with forced temperature=0.0
Stop token resolution for OCR-specific sequences (<|user|>, <|im_end|>, etc.)

Tool Calling for VLM

mlx-lm native tool parser injection into VLM tokenizer at engine start
Image + tool calling: tool definitions included in vision prompts via HF apply_chat_template
Supports json_tools, qwen3_coder, glm47, mistral, and all mlx-lm parsers

Benchmark

VLM image benchmark: "Include sample image" checkbox in continuous batching tests

Model Discovery

VLM auto-detection via mlx-vlm config patterns (vision_config, processor files)
VLM model settings modal in admin dashboard
Bench model filter updated to include VLM models

Tests

test_vlm_engine.py — 30 tests covering tool calling injection, chat template, OCR prompts, message processing, vision inputs
test_vlm_model_adapter.py — VLM adapter property, cache, embedding, forward pass tests
test_image_utils.py — Image loading, extraction, hashing tests
test_model_discovery.py — VLM model detection tests

33 files changed, +3,414 / -68 lines

Full changelog: v0.1.15...v0.2.0