jundot/omlx v0.2.3 on GitHub

Highlight: Vision-Language Model Support with Tiered Caching

Starting with v0.2.0, oMLX sees the world - not just text.

Vision-language models now run natively on your Mac with the same continuous batching, paged KV cache, and SSD-tiered caching that powers text inference. Combined with production-grade tool calling, your Apple Silicon machine becomes a local inference server that doesn't just demo well - it actually works. Agentic coding, OpenClaw, multi-turn vision chat: real workloads, real performance, no cloud required.

For full v0.2.0 feature details, see v0.2.0 release notes.

New Features (v0.2.3)

Option to disable model memory limit

Added option to disable model memory limit by setting the slider to 0 in the admin dashboard

Bug Fixes (v0.2.3)

Streaming response corruption on keep-alive connections (#80)

Fixed TransferEncodingError when sending a second message in the same Open WebUI conversation over a local connection
Removed duplicate ASGI receive() consumers that corrupted HTTP keep-alive state
Replaced BaseHTTPMiddleware with a pure ASGI middleware to avoid streaming response pipe interference

VLM batch generation shape mismatch (#79)

Fixed shape mismatch error during VLM batch generation

Homebrew install failure (#78)

Fixed brew install by making MCP an optional dependency

SSD cache fallback robustness (#74, #75)

Fixed block metadata not being rolled back when SSD cache save fails
Fixed SSD fallback block registration in paged cache

Scheduler cache corruption recovery

Fixed broadened recovery to also catch AttributeError and ValueError

Full changelog: v0.2.2...v0.2.3

New Contributors

@lyonsno made their first contribution in #74

Thanks to @lyonsno for the contribution!