Dev release with Gemma 4 native tool calling, UI improvements, and several bug fixes. This is a dev build and may contain bugs. If you run into any issues, please open an issue.
Highlights
Gemma 4 native tool calling
Bumped mlx-lm to dcbf6e3 and mlx-vlm to 23e1dff. mlx-lm now ships a native Gemma 4 tool call parser (<|tool_call> / <tool_call|>) and multi-token think/tool token support. omlx's existing parse_tool_calls() picks up the new parser automatically — no server-side special-casing needed.
Removed the Gemma 4 multi-image vision monkey patch since mlx-vlm handles different-sized images natively now. The batched decode patch stays for now (mlx-vlm still uses cache.state based KV sharing).
Auto theme mode
System appearance sync with auto/light/dark theme picker in the admin UI. by @Stv-X (#621, #624)
New Features
- Reproducible generation via
seedparameter (#640) - 6-bit and 8-bit TurboQuant options (#594)
- Skip admin auth when
skip_api_key_verificationis enabled on localhost (#587) - Unify
max_num_seqsandcompletion_batch_sizeintomax_concurrent_requests
Bug Fixes
- Fix TurboQuant SSD cache reconstruction crash (#577)
- Fix oQ: skip quantizing modules without
to_quantized()(#625) - Fix oQ: unwrap tuple layer outputs in sensitivity measurement (#627)
- Fix dark mode heading visibility and mobile UX in chat page (#586)
- Fix navbar theme picker width clipping
- Fix Homebrew formula sha256 for v0.3.4 (#589)
- Fix trigger formula update on release publish instead of tag push
- Bundle spacy
en_core_web_smfor Kokoro TTS in DMG build (#590) - Add missing TTS/STT/STS deps to audio extra (#590)
Dependencies
- Bump mlx-lm to dcbf6e3 (Gemma 4 tool call parser, multi-token think/tool)
- Bump mlx-vlm to 23e1dff (Gemma 4 multi-image fix, nested tool parser)
- Add
regexdependency
New Contributors
Full changelog: v0.3.4...v0.3.5.dev1