This is a pre-release build for testing purposes.
New Features
- Native BERT/XLMRoBERTa embedding — load BERT-family embedding models (bge-m3, mxbai-embed) without mlx-embeddings fallback (#330 by @yes999zc)
- Jina v3 reranker — reranking via
<|score_token|>logits for jinaai/jina-reranker-v3-mlx (#331 by @yes999zc) - oQ3.5 quantization level — base 3-bit + expert down_proj 4-bit (~3.9 bpw)
- oQ VLM support — quantize vision-language models with vision weight preservation
- oQ FP8 model support — allow native FP8 models (MiniMax, DeepSeek) as quantization source
- Partial mode — assistant message prefill support for Moonshot/Kimi K2 models (
partialfield +namefield passthrough) (#306 by @blightbow) - Benchmark sample sizes — add 500/1000/2000 sample options for MMLU and HellaSwag
- Benchmark comparison columns — show mode/sample and full dataset size in comparison table
- Codex smart config merging — non-destructive config merge with reasoning model auto-detection (#249 by @JasonYeYuhe)
- i18n normalization — normalize translation files against en.json with missing key detection (#247 by @xiaoran007)
- Admin generating status — show generating status for active requests after prefill completes
Bug Fixes
- fix oQ bf16→fp16 weight conversion causing 41% quantized value corruption
- fix oQ mxfp4 uint8 scales being force-cast to fp16
- fix oQ clip optimization mask dtype and position_ids for Qwen3.5
- fix think prefix false positive for disabled thinking patterns (
<think></think>) - fix responses API image support for VLM + missing prompt_tokens in completions usage
- fix SSE streaming behind nginx reverse proxy (X-Accel-Buffering header) (#309)
- fix CausalLM-based embedding model detection (Qwen3-Embedding) (#327)
- fix admin unload tooltip clipping in active models box (#314)
- fix admin 401 warning log spam from dashboard polling
- fix admin model settings not showing for embedding/reranker models
- fix PEP 735 dependency-groups for
uv sync --dev(#305 by @blightbow)
New Contributors
- @blightbow made their first contribution in #305
- @yes999zc made their first contribution in #330
- @JasonYeYuhe made their first contribution in #249
- @xiaoran007 made their first contribution in #247