github jundot/omlx v0.2.0

latest releases: v0.3.7rc1, v0.3.6, v0.3.5...
one month ago

Highlight: Vision-Language Model Support with Tiered Caching

Starting with v0.2.0, oMLX sees the world — not just text.

Vision-language models now run natively on your Mac with the same continuous batching, paged KV cache, and SSD-tiered caching that powers text inference. Combined with production-grade tool calling, your Apple Silicon machine becomes a local inference server that doesn't just demo well — it actually works. Agentic coding, OpenClaw, multi-turn vision chat: real workloads, real performance, no cloud required.

What's New

VLM Engine (omlx/engine/vlm.py, omlx/models/vlm.py)

  • Vision-Language Model engine via mlx-vlm integration for vision encoding + mlx-lm BatchGenerator for inference
  • VLMModelAdapter wrapping VLM's language_model for full BatchGenerator compatibility
  • Batched VLM prefill with per-UID embeddings in _BoundarySnapshotBatchGenerator
  • Chunked prefill support with embedding offset tracking for large vision inputs
  • Prefix cache and paged cache support for VLM requests (vision context reuse)

Image Processing (omlx/utils/image.py)

  • Image input support: base64 data URIs, HTTP/HTTPS URLs, local file paths
  • Multi-image chat for supported models (Qwen2.5-VL, GLM-4V, etc.)
  • SHA256 image hashing for prefix cache deduplication
  • Anthropic API vision support: base64 image_url conversion for /v1/messages

OCR Models

  • Auto-prompts for DeepSeek-OCR, DOTS-OCR, GLM-OCR with forced temperature=0.0
  • Stop token resolution for OCR-specific sequences (<|user|>, <|im_end|>, etc.)

Tool Calling for VLM

  • mlx-lm native tool parser injection into VLM tokenizer at engine start
  • Image + tool calling: tool definitions included in vision prompts via HF apply_chat_template
  • Supports json_tools, qwen3_coder, glm47, mistral, and all mlx-lm parsers

Benchmark

  • VLM image benchmark: "Include sample image" checkbox in continuous batching tests

Model Discovery

  • VLM auto-detection via mlx-vlm config patterns (vision_config, processor files)
  • VLM model settings modal in admin dashboard
  • Bench model filter updated to include VLM models

Tests

  • test_vlm_engine.py — 30 tests covering tool calling injection, chat template, OCR prompts, message processing, vision inputs
  • test_vlm_model_adapter.py — VLM adapter property, cache, embedding, forward pass tests
  • test_image_utils.py — Image loading, extraction, hashing tests
  • test_model_discovery.py — VLM model detection tests

33 files changed, +3,414 / -68 lines

Full changelog: v0.1.15...v0.2.0

Don't miss a new omlx release

NewReleases is sending notifications on new releases.