ml-explore/mlx-lm v0.30.6 on GitHub

What's Changed

Transformers v5 by @awni in #811
Add LongCat Flash tool parser by @kernelpool in #810
Add Kimi-K2.5 by @kernelpool in #813
Bump mlx version and version by @awni in #816
Fix NemotronH config compatibility with HuggingFace format by @LuqDaMan in #820
Fix for Exception - MultiLinear.to_quantized() missing 'mode' by @inferencers in #809
Fix Kimi K2.5 tool call handling by @kernelpool in #821
Actually add cli by @awni in #823
Add LongCat Flash Lite by @kernelpool in #819
Fix mixed quant by @awni in #825
Support distributed inference in the server by @angeloskath in #741
fix cli by @solarpunkin in #827
Enable loading custom models by @awni in #830
Allow default creation of BatchRotatingKVCache instead of BatchKVCache in batch mode by @christian-lms in #834
Add Step 3.5 Flash by @kernelpool in #836
server: support chat_template_kwargs and top_logprobs by @percontation in #829
fix: handle GLM 4.7 tool call fallbacks by @jalehman in #792
Deepseek V3.2 implementation fixes by @sjug in #838
Fix Step 3.5 Flash model conversion by @kernelpool in #840
Fix batch mamba by @awni in #842
Fix sliding window mask during generation by @kernelpool in #843
DSV3 MLA by @awni in #839

Full Changelog: v0.30.5...v0.30.6