ml-explore/mlx-lm v0.31.3 on GitHub

Highlights

Bump the patch version by @angeloskath in #1124
Fix batch dimension mismatch in BatchKVCache and BatchRotatingKVCache extend() by @razorback16 in #1141
Fix parallel tool call handling in server by @kernelpool in #1170
Fix MiniMax M2 parallel tool calling by @kernelpool in #1171
Fix missing tree_reduce import in models/cache.py by @siiea-ai in #1165
Apertus tie_word_embeddings fix by @BlackSamorez in #1143
Fix batch dimension mismatch in ArraysCache extend() by @techtoboggan in #1169
Fix dwq: check for actual safetensors in target_dir by @micuentadecasa in #1173
fix: handle NoneType check for think tokens in TokenizerWrapper by @yuetyeelo2855 in #1167
Fix Gemma4 tool parser: support hyphenated function names and braces in string args by @AkashKhamkar in #1150
Fix empty tool_call_end breaking Mistral tool calls by @eyupcanakman in #1151
Fix ArraysCache extend by @angeloskath in #1177
Fix Gemma 4 KV-shared layers creating unused projections by @glyphVault in #1158
Thread local generation stream by @angeloskath in #1090

Full Changelog: v0.31.2...v0.31.3