Highlight: Vision-Language Model Support with Tiered Caching
Starting with v0.2.0, oMLX sees the world — not just text.
Vision-language models now run natively on your Mac with the same continuous batching, paged KV cache, and SSD-tiered caching that powers text inference. Combined with production-grade tool calling, your Apple Silicon machine becomes a local inference server that doesn't just demo well — it actually works. Agentic coding, OpenClaw, multi-turn vision chat: real workloads, real performance, no cloud required.
What's New
VLM Engine (omlx/engine/vlm.py, omlx/models/vlm.py)
- Vision-Language Model engine via mlx-vlm integration for vision encoding + mlx-lm
BatchGeneratorfor inference - VLMModelAdapter wrapping VLM's
language_modelfor fullBatchGeneratorcompatibility - Batched VLM prefill with per-UID embeddings in
_BoundarySnapshotBatchGenerator - Chunked prefill support with embedding offset tracking for large vision inputs
- Prefix cache and paged cache support for VLM requests (vision context reuse)
Image Processing (omlx/utils/image.py)
- Image input support: base64 data URIs, HTTP/HTTPS URLs, local file paths
- Multi-image chat for supported models (Qwen2.5-VL, GLM-4V, etc.)
- SHA256 image hashing for prefix cache deduplication
- Anthropic API vision support: base64
image_urlconversion for/v1/messages
OCR Models
- Auto-prompts for DeepSeek-OCR, DOTS-OCR, GLM-OCR with forced
temperature=0.0 - Stop token resolution for OCR-specific sequences (
<|user|>,<|im_end|>, etc.)
Tool Calling for VLM
- mlx-lm native tool parser injection into VLM tokenizer at engine start
- Image + tool calling: tool definitions included in vision prompts via HF
apply_chat_template - Supports json_tools, qwen3_coder, glm47, mistral, and all mlx-lm parsers
Benchmark
- VLM image benchmark: "Include sample image" checkbox in continuous batching tests
Model Discovery
- VLM auto-detection via
mlx-vlmconfig patterns (vision_config, processor files) - VLM model settings modal in admin dashboard
- Bench model filter updated to include VLM models
Tests
test_vlm_engine.py— 30 tests covering tool calling injection, chat template, OCR prompts, message processing, vision inputstest_vlm_model_adapter.py— VLM adapter property, cache, embedding, forward pass teststest_image_utils.py— Image loading, extraction, hashing teststest_model_discovery.py— VLM model detection tests
33 files changed, +3,414 / -68 lines
Full changelog: v0.1.15...v0.2.0