jundot/omlx v0.1.0
v0.1.0 — Initial Release

on GitHub

latest releases: v0.3.0, v0.3.0rc1, v0.2.24...

one month ago

LLM inference, optimized for your Mac
Continuous batching and infinite SSD caching, managed directly from your menu bar.

Features

Inference

Continuous Batching — Handle multiple concurrent requests with mlx-lm BatchGenerator
Multi-Model Serving — Load LLM, Embedding, Reranker models simultaneously, with LRU eviction
Reasoning Model Support — Automatic <think> tag handling for DeepSeek, MiniMax models
Harmony Protocol — Native support for gpt-oss models via openai-harmony parser

Caching

Paged KV Cache — Block-based with prefix sharing and copy-on-write (vLLM-inspired)
SSD Tiered Caching — Automatic GPU to SSD offloading for virtually unlimited context caching
Hybrid Cache — Mixed KVCache + RotatingKVCache for complex architectures (Gemma3, etc.)
Persistent Cache — KV cache blocks survive server restarts via safetensors storage

API

OpenAI Compatible — /v1/chat/completions, /v1/completions, /v1/models, /v1/embeddings
Anthropic Compatible — /v1/messages with streaming support
Tool Calling — JSON, Qwen, Gemma, MiniMax, GLM formats + MCP integration
Structured Output — JSON mode and JSON Schema validation

macOS App

Native Menubar App — PyObjC-based, not Electron. Start/Stop/Restart from menu bar
Admin Dashboard — Real-time monitoring, built-in chat, per-model settings at /admin
Model Downloader — Search and download MLX models from HuggingFace in the dashboard
Auto-Update Check — GitHub Releases-based update notification
Signed & Notarized — Developer ID signed, Apple notarized DMG distribution

Requirements

Apple Silicon (M1/M2/M3/M4)
macOS 14.0+ (Sonoma)

Install

Download the DMG, drag oMLX to Applications, and launch — that's it.

Check out latest releases or
releases around jundot/omlx v0.1.0

Don't miss a new omlx release

NewReleases is sending notifications on new releases.

Get notifications