jundot/omlx v0.2.17 on GitHub

Download the DMG that matches your macOS version (sequoia or tahoe).
If you're on an M5 Mac, you must use the macos26-tahoe DMG for M5 Neural Accelerator.

Hotfix (v0.2.17)

fix /v1/embeddings and /v1/rerank crash — mx.compile on container-returning models (#282, #283)

v0.2.16 has been removed due to this bug. all v0.2.16 changes are included below.

New Models

Nemotron Super hybrid architecture support — layers_block_type config, MoE latent projections (upstream mlx-lm 73c8550)

Bug Fixes

fix GatedDelta/SSM state precision (Qwen3.5, Qwen3-next, Kimi-Linear) — state buffers now use float32 instead of input dtype for numerical correctness. upstream fix (735a43b)

oMLX produces token-identical output to mlx-lm's BatchGenerator regardless of SSD cache on/off. (verified on KVCache, ArraysCache, ArraysCache MoE models)
fix BatchRotatingKVCache lazy evaluation ordering — left_padding/offset could hold stale values due to MLX deferred evaluation. upstream fix (89c430a) adds mx.depends() to enforce correct order
fix SuScaledRoPE/YarnRoPE mutating input arrays in-place — upstream fix (2146e4e) shallow-copies before modification
fix Qwen3-Coder tool parser crashing on Python-style quoted dicts — falls back to ast.literal_eval when json.loads fails (upstream ed69f83)
fix api_key exposed in stats response and integration CLI commands (#256)
fix redundant /admin/api/login calls on every stats poll — reuses admin session
fix sub-block cache (<block_size) now surfaced in runtime cache observability instead of showing misleading 0 indexed blocks (#256)

Dependency Updates

mlx-lm 4a21ffd (0.31.1) → 564281f (0.31.2)

New Contributors

@yes999zc made their first contribution in #256 and fixed the compile crash in #283

Thanks to @yes999zc for the contributions!

full changelog: v0.2.15...v0.2.17