Highlights
- Lots of bugfixes
- Thread local generation stream to accompany MLX v0.31.2
What's Changed
- Bump the patch version by @angeloskath in #1124
- Fix batch dimension mismatch in BatchKVCache and BatchRotatingKVCache extend() by @razorback16 in #1141
- Fix parallel tool call handling in server by @kernelpool in #1170
- Fix MiniMax M2 parallel tool calling by @kernelpool in #1171
- Fix missing tree_reduce import in models/cache.py by @siiea-ai in #1165
- Apertus tie_word_embeddings fix by @BlackSamorez in #1143
- Fix batch dimension mismatch in ArraysCache extend() by @techtoboggan in #1169
- Fix dwq: check for actual safetensors in target_dir by @micuentadecasa in #1173
- fix: handle NoneType check for think tokens in TokenizerWrapper by @yuetyeelo2855 in #1167
- Fix Gemma4 tool parser: support hyphenated function names and braces in string args by @AkashKhamkar in #1150
- Fix empty tool_call_end breaking Mistral tool calls by @eyupcanakman in #1151
- Fix ArraysCache extend by @angeloskath in #1177
- Fix Gemma 4 KV-shared layers creating unused projections by @glyphVault in #1158
- Thread local generation stream by @angeloskath in #1090
New Contributors
- @razorback16 made their first contribution in #1141
- @siiea-ai made their first contribution in #1165
- @BlackSamorez made their first contribution in #1143
- @techtoboggan made their first contribution in #1169
- @micuentadecasa made their first contribution in #1173
- @yuetyeelo2855 made their first contribution in #1167
- @AkashKhamkar made their first contribution in #1150
- @glyphVault made their first contribution in #1158
Full Changelog: v0.31.2...v0.31.3