github jundot/omlx v0.2.3

latest releases: v0.3.8.dev1, v0.3.7, v0.3.7rc2...
one month ago

Highlight: Vision-Language Model Support with Tiered Caching

Starting with v0.2.0, oMLX sees the world - not just text.

Vision-language models now run natively on your Mac with the same continuous batching, paged KV cache, and SSD-tiered caching that powers text inference. Combined with production-grade tool calling, your Apple Silicon machine becomes a local inference server that doesn't just demo well - it actually works. Agentic coding, OpenClaw, multi-turn vision chat: real workloads, real performance, no cloud required.

For full v0.2.0 feature details, see v0.2.0 release notes.

New Features (v0.2.3)

Option to disable model memory limit

  • Added option to disable model memory limit by setting the slider to 0 in the admin dashboard

Bug Fixes (v0.2.3)

Streaming response corruption on keep-alive connections (#80)

  • Fixed TransferEncodingError when sending a second message in the same Open WebUI conversation over a local connection
  • Removed duplicate ASGI receive() consumers that corrupted HTTP keep-alive state
  • Replaced BaseHTTPMiddleware with a pure ASGI middleware to avoid streaming response pipe interference

VLM batch generation shape mismatch (#79)

  • Fixed shape mismatch error during VLM batch generation

Homebrew install failure (#78)

  • Fixed brew install by making MCP an optional dependency

SSD cache fallback robustness (#74, #75)

  • Fixed block metadata not being rolled back when SSD cache save fails
  • Fixed SSD fallback block registration in paged cache

Scheduler cache corruption recovery

  • Fixed broadened recovery to also catch AttributeError and ValueError

Full changelog: v0.2.2...v0.2.3

New Contributors

Thanks to @lyonsno for the contribution!

Don't miss a new omlx release

NewReleases is sending notifications on new releases.