github jundot/omlx v0.2.17

latest releases: v0.3.9.dev1, v0.3.8, v0.3.8rc1...
one month ago

Download the DMG that matches your macOS version (sequoia or tahoe).
If you're on an M5 Mac, you must use the macos26-tahoe DMG for M5 Neural Accelerator.

Hotfix (v0.2.17)

  • fix /v1/embeddings and /v1/rerank crash — mx.compile on container-returning models (#282, #283)

v0.2.16 has been removed due to this bug. all v0.2.16 changes are included below.


New Models

  • Nemotron Super hybrid architecture support — layers_block_type config, MoE latent projections (upstream mlx-lm 73c8550)

Bug Fixes

  • fix GatedDelta/SSM state precision (Qwen3.5, Qwen3-next, Kimi-Linear) — state buffers now use float32 instead of input dtype for numerical correctness. upstream fix (735a43b)

    oMLX produces token-identical output to mlx-lm's BatchGenerator regardless of SSD cache on/off. (verified on KVCache, ArraysCache, ArraysCache MoE models)

  • fix BatchRotatingKVCache lazy evaluation ordering — left_padding/offset could hold stale values due to MLX deferred evaluation. upstream fix (89c430a) adds mx.depends() to enforce correct order

  • fix SuScaledRoPE/YarnRoPE mutating input arrays in-place — upstream fix (2146e4e) shallow-copies before modification

  • fix Qwen3-Coder tool parser crashing on Python-style quoted dicts — falls back to ast.literal_eval when json.loads fails (upstream ed69f83)

  • fix api_key exposed in stats response and integration CLI commands (#256)

  • fix redundant /admin/api/login calls on every stats poll — reuses admin session

  • fix sub-block cache (<block_size) now surfaced in runtime cache observability instead of showing misleading 0 indexed blocks (#256)

Dependency Updates

  • mlx-lm 4a21ffd (0.31.1) → 564281f (0.31.2)

New Contributors

  • @yes999zc made their first contribution in #256 and fixed the compile crash in #283

Thanks to @yes999zc for the contributions!

full changelog: v0.2.15...v0.2.17

Don't miss a new omlx release

NewReleases is sending notifications on new releases.