github jundot/omlx v0.1.0
v0.1.0 — Initial Release

latest releases: v0.3.0, v0.3.0rc1, v0.2.24...
one month ago

LLM inference, optimized for your Mac
Continuous batching and infinite SSD caching, managed directly from your menu bar.

oMLX Dashboard


Features

Inference

  • Continuous Batching — Handle multiple concurrent requests with mlx-lm BatchGenerator
  • Multi-Model Serving — Load LLM, Embedding, Reranker models simultaneously, with LRU eviction
  • Reasoning Model Support — Automatic <think> tag handling for DeepSeek, MiniMax models
  • Harmony Protocol — Native support for gpt-oss models via openai-harmony parser

Caching

  • Paged KV Cache — Block-based with prefix sharing and copy-on-write (vLLM-inspired)
  • SSD Tiered Caching — Automatic GPU to SSD offloading for virtually unlimited context caching
  • Hybrid Cache — Mixed KVCache + RotatingKVCache for complex architectures (Gemma3, etc.)
  • Persistent Cache — KV cache blocks survive server restarts via safetensors storage

API

  • OpenAI Compatible/v1/chat/completions, /v1/completions, /v1/models, /v1/embeddings
  • Anthropic Compatible/v1/messages with streaming support
  • Tool Calling — JSON, Qwen, Gemma, MiniMax, GLM formats + MCP integration
  • Structured Output — JSON mode and JSON Schema validation

macOS App

  • Native Menubar App — PyObjC-based, not Electron. Start/Stop/Restart from menu bar
  • Admin Dashboard — Real-time monitoring, built-in chat, per-model settings at /admin
  • Model Downloader — Search and download MLX models from HuggingFace in the dashboard
  • Auto-Update Check — GitHub Releases-based update notification
  • Signed & Notarized — Developer ID signed, Apple notarized DMG distribution

Requirements

  • Apple Silicon (M1/M2/M3/M4)
  • macOS 14.0+ (Sonoma)

Install

Download the DMG, drag oMLX to Applications, and launch — that's it.

Don't miss a new omlx release

NewReleases is sending notifications on new releases.