What's Changed
- Transformers v5 by @awni in #811
- Add LongCat Flash tool parser by @kernelpool in #810
- Add Kimi-K2.5 by @kernelpool in #813
- Bump mlx version and version by @awni in #816
- Fix NemotronH config compatibility with HuggingFace format by @LuqDaMan in #820
- Fix for Exception - MultiLinear.to_quantized() missing 'mode' by @inferencers in #809
- Fix Kimi K2.5 tool call handling by @kernelpool in #821
- Actually add cli by @awni in #823
- Add LongCat Flash Lite by @kernelpool in #819
- Fix mixed quant by @awni in #825
- Support distributed inference in the server by @angeloskath in #741
- fix cli by @solarpunkin in #827
- Enable loading custom models by @awni in #830
- Allow default creation of BatchRotatingKVCache instead of BatchKVCache in batch mode by @christian-lms in #834
- Add Step 3.5 Flash by @kernelpool in #836
- server: support chat_template_kwargs and top_logprobs by @percontation in #829
- fix: handle GLM 4.7 tool call fallbacks by @jalehman in #792
- Deepseek V3.2 implementation fixes by @sjug in #838
- Fix Step 3.5 Flash model conversion by @kernelpool in #840
- Fix batch mamba by @awni in #842
- Fix sliding window mask during generation by @kernelpool in #843
- DSV3 MLA by @awni in #839
New Contributors
Full Changelog: v0.30.5...v0.30.6