github jundot/omlx v0.2.20.dev3

latest releases: v0.2.20, v0.2.20rc1
pre-release2 days ago

This is a pre-release build for testing purposes.

New Features

  • Native BERT/XLMRoBERTa embedding — load BERT-family embedding models (bge-m3, mxbai-embed) without mlx-embeddings fallback (#330 by @yes999zc)
  • Jina v3 reranker — reranking via <|score_token|> logits for jinaai/jina-reranker-v3-mlx (#331 by @yes999zc)
  • oQ3.5 quantization level — base 3-bit + expert down_proj 4-bit (~3.9 bpw)
  • oQ VLM support — quantize vision-language models with vision weight preservation
  • oQ FP8 model support — allow native FP8 models (MiniMax, DeepSeek) as quantization source
  • Partial mode — assistant message prefill support for Moonshot/Kimi K2 models (partial field + name field passthrough) (#306 by @blightbow)
  • Benchmark sample sizes — add 500/1000/2000 sample options for MMLU and HellaSwag
  • Benchmark comparison columns — show mode/sample and full dataset size in comparison table
  • Codex smart config merging — non-destructive config merge with reasoning model auto-detection (#249 by @JasonYeYuhe)
  • i18n normalization — normalize translation files against en.json with missing key detection (#247 by @xiaoran007)
  • Admin generating status — show generating status for active requests after prefill completes

Bug Fixes

  • fix oQ bf16→fp16 weight conversion causing 41% quantized value corruption
  • fix oQ mxfp4 uint8 scales being force-cast to fp16
  • fix oQ clip optimization mask dtype and position_ids for Qwen3.5
  • fix think prefix false positive for disabled thinking patterns (<think></think>)
  • fix responses API image support for VLM + missing prompt_tokens in completions usage
  • fix SSE streaming behind nginx reverse proxy (X-Accel-Buffering header) (#309)
  • fix CausalLM-based embedding model detection (Qwen3-Embedding) (#327)
  • fix admin unload tooltip clipping in active models box (#314)
  • fix admin 401 warning log spam from dashboard polling
  • fix admin model settings not showing for embedding/reranker models
  • fix PEP 735 dependency-groups for uv sync --dev (#305 by @blightbow)

New Contributors

Don't miss a new omlx release

NewReleases is sending notifications on new releases.