github jundot/omlx v0.1.1
oMLX v0.1.1

latest releases: v0.3.4, v0.3.3, v0.3.2...
one month ago

What's New

Two-level model directory scanning (#1)

  • Support for organization folder layouts (e.g., mlx-community/llama-3b/)
  • Flat and two-level directories can coexist in the same model directory

Streaming tool call parsing (#2)

  • Stream tool calls in OpenAI-compatible format
  • XML fallback parser for GLM/Qwen/Llama models without native tool call support
  • Content buffering prevents duplicate tool call output

Client disconnect detection (#3)

  • Streaming responses now detect client disconnects via ASGI
  • Proper cleanup of async generators and pending tasks on disconnect

KV cache headroom & manual model unload (#4)

  • 25% KV cache headroom during model loading for better multi-model memory management
  • Manual model unload via POST /v1/models/{model_id}/unload and admin panel

New Contributors

Thanks to @thornad for all four PRs in this release!

Don't miss a new omlx release

NewReleases is sending notifications on new releases.