jundot/omlx v0.3.6 on GitHub

Highlights

Menubar icon no longer blocked by ControlCenter

For some users the menubar icon never appeared, or vanished after sleep/wake/restart. Fixed the underlying config conflict and reset the status item identity so blocked users get unblocked on upgrade, with a fallback dialog pointing to System Settings if the icon still doesn't show. Closes #725, #806.

float16 oQ option for M1/M2 prefill speedup

Added dtype param to quantize_oq_streaming that casts fp tensors to the target dtype before mx.quantize so scales/biases inherit the chosen dtype. Exposed as a toggle in admin UI Advanced Settings (bfloat16 default). float16 yields ~20% faster prefill on M1/M2 thanks to native fp16 GPU support. Closes #604.

Jina Reranker V3 support

JinaForRanking now uses the upstream listwise hidden-state projector pipeline instead of score-token logit scoring, so multilingual reranking matches the model contract. Also classified as a directly supported reranker architecture to avoid false negatives from CausalLM directory-name heuristics. by @j-huang-rj (#745)

oQ streaming quantization for huge MoE models

Chunked load/quantize and a discovery-based streaming sanitizer let oQ process massive MoE checkpoints like Qwen3.5-397B-A17B directly on Apple Silicon with bounded peak RAM. Also added FP8 source support (MiniMax-M2.7, DeepSeek FP8). Tested end-to-end on M3 Ultra 512GB. by @yohann-bearzi (#737)

New Features

float16 dtype option for oQ quantization (#604)
Improved Serving Stats menubar layout with compact number display by @CHW0n9 (#779)
Generic discovery-based streaming sanitizer (replaces Qwen-specific _StreamingPlan)
FP8 source model support for oQ

Bug Fixes

Fix menubar icon blocked by ControlCenter on sleep/wake/restart (#725, #806)
Fix Gemma 4 tool_responses attached to separate message causing infinite tool-call loop by @latent-variable (#799)
Fix text-only list content not normalized to string in VLM message formatting (#796)
Fix stale prefill progress tracker entry after external prefill
Fix Jina hidden-state extraction contract narrowed to the documented shape
Fix _LazyTensorIndex.pop() materializing to mx.array so third-party sanitizers calling mx.stack on popped tensors work
Fix _LazyTensor.__getitem__ and _materialize_source handling 0-dim scalars (needed for Gemma 4 scaling factors)
Fix _LazyTensorIndex.__iter__ and items() including _overrides keys

New Contributors

@yohann-bearzi - oQ chunked load/quantize and streaming VLM sanitizer for huge MoE models (#737)
@j-huang-rj - Jina Reranker V3 listwise scoring (#745)
@CHW0n9 - Serving Stats menubar layout polish (#779)

Full changelog: v0.3.5...v0.3.6