jundot/omlx v0.2.7 on GitHub

What's New

HuggingFace mirror endpoint support: configure a custom HF mirror endpoint for regions with restricted access to huggingface.co. applies to model downloads, search, and all Hub API calls. (#116)
Dashboard tab persistence: selected dashboard tabs are now persisted in URL query params, so refreshing the page or sharing a link keeps your current view. (#129)
Extended metrics reference: batch size, speedup ratio, and per-request prefill TPS added to the metrics reference panel. (#101)
mlx-lm upgraded to v0.31.1: updated to commit 4a21ffd for latest model support and bug fixes.

Streaming with tool calls: content is now streamed token-by-token even when tools are present, instead of buffering the entire response. (#103)
Model alias settings lookup: per-model settings (temperature, max tokens, etc.) now correctly resolve model aliases before lookup. (#117)
Cache corruption infinite loop: cache corruption during prefill no longer causes an infinite retry loop. the corrupted cache is cleared and prefill restarts cleanly.
Requests dict leak on cache failure: fail_all_requests no longer triggers a full cache reset, and properly cleans up the requests dictionary.
HuggingFace API timeouts: added timeouts to all HuggingFace Hub API calls to prevent the server from freezing when HF is unreachable. (#124)
Qwen3/Gemma3 misidentified as embedding models: LLMs with certain architectures were incorrectly classified as embedding models. (#130)
macOS 15.0+ requirement enforced: MLX >= 0.29.2 requires macOS 15.0 (Sequoia). the app now checks and enforces this at startup. (#125)
i18n language setting not persisting: language setting selected before server init was lost after initialization. (#119)
Anthropic tool-call filtering: added fallback safety for edge cases in Anthropic adapter tool-call handling.

Thanks to @TipKnuckle, @jonsnowljs, and @rsnow for their contributions!