github jundot/omlx v0.3.6

7 hours ago

Highlights

Menubar icon no longer blocked by ControlCenter

For some users the menubar icon never appeared, or vanished after sleep/wake/restart. Fixed the underlying config conflict and reset the status item identity so blocked users get unblocked on upgrade, with a fallback dialog pointing to System Settings if the icon still doesn't show. Closes #725, #806.

float16 oQ option for M1/M2 prefill speedup

Added dtype param to quantize_oq_streaming that casts fp tensors to the target dtype before mx.quantize so scales/biases inherit the chosen dtype. Exposed as a toggle in admin UI Advanced Settings (bfloat16 default). float16 yields ~20% faster prefill on M1/M2 thanks to native fp16 GPU support. Closes #604.

Jina Reranker V3 support

JinaForRanking now uses the upstream listwise hidden-state projector pipeline instead of score-token logit scoring, so multilingual reranking matches the model contract. Also classified as a directly supported reranker architecture to avoid false negatives from CausalLM directory-name heuristics. by @j-huang-rj (#745)

oQ streaming quantization for huge MoE models

Chunked load/quantize and a discovery-based streaming sanitizer let oQ process massive MoE checkpoints like Qwen3.5-397B-A17B directly on Apple Silicon with bounded peak RAM. Also added FP8 source support (MiniMax-M2.7, DeepSeek FP8). Tested end-to-end on M3 Ultra 512GB. by @yohann-bearzi (#737)

New Features

  • float16 dtype option for oQ quantization (#604)
  • Improved Serving Stats menubar layout with compact number display by @CHW0n9 (#779)
  • Generic discovery-based streaming sanitizer (replaces Qwen-specific _StreamingPlan)
  • FP8 source model support for oQ

Bug Fixes

  • Fix menubar icon blocked by ControlCenter on sleep/wake/restart (#725, #806)
  • Fix Gemma 4 tool_responses attached to separate message causing infinite tool-call loop by @latent-variable (#799)
  • Fix text-only list content not normalized to string in VLM message formatting (#796)
  • Fix stale prefill progress tracker entry after external prefill
  • Fix Jina hidden-state extraction contract narrowed to the documented shape
  • Fix _LazyTensorIndex.pop() materializing to mx.array so third-party sanitizers calling mx.stack on popped tensors work
  • Fix _LazyTensor.__getitem__ and _materialize_source handling 0-dim scalars (needed for Gemma 4 scaling factors)
  • Fix _LazyTensorIndex.__iter__ and items() including _overrides keys

New Contributors

Full changelog: v0.3.5...v0.3.6

Don't miss a new omlx release

NewReleases is sending notifications on new releases.