ml-explore/mlx-lm v0.31.2 on GitHub

Highlights

Caching system prompt and user messages for non-trimmable caches
Batch generator refactoring

What's Changed

Bump the patch version by @angeloskath in #959
Presence and frequency penalties by @angeloskath in #971
Eval self.left_padding whenever it is updated in BatchRotatingKVCache by @rltakashige in #960
Late binding caused incorrect cache checkpoint by @angeloskath in #976
Move to metal agnostic device_info by @angeloskath in #979
Fix CompletionsDataset mask_prompt crash by @eyupcanakman in #967
Bump the patch version by @angeloskath in #981
Fix test after latest MLX update by @angeloskath in #996
Clear cache trainer memory by @N8python in #986
feat(server): add --allowed-origins by @nwtgck in #987
Delta net precision by @angeloskath in #997
avoid mutating input in SuScaledRoPE and YarnRoPE by @mm65x in #1003
handle missing content-length header in server by @mm65x in #1001
fall back to ast.literal_eval for malformed JSON in qwen3_coder tool parser by @mm65x in #1004
Nemotron super support by @angeloskath in #992
Supporting delay in mlx_lm benchmark by @AndreasPlt in #1010
Fix flaky test by @angeloskath in #1020
Fix missing cache advance from qwen 3.5 by @angeloskath in #1024
Refactor LRUPromptCache by @angeloskath in #1019
Fix SSM dt clamp default for Nemotron-H by @kernelpool in #1026
Inserting logits processors into BatchGenerator in batch_generate by @arthurhjorth in #1008
fix: break shared-buffer memory leak in GatedDeltaNet cache by @adurham in #1077
Fix PromptTrie.pop_prefixes() off-by-one when pruning immediate prefixes by @LxYuan0420 in #1078
Batch generation refactoring and various fixes by @angeloskath in #1072
perf: use max instead of argsort in apply_min_p sampling by @matteocelani in #1083
Add gemma 4 by @Blaizzy in #1093
Bring back max-kv-size to the batch generator by @angeloskath in #1106
Add Gemma 4 tool call parser by @nicdavidson in #1105
Fix Gemma 4 quantized per-layer projection loading by @spicyneuron in #1112
Fix output corruption in speculative decoding by @kernelpool in #1109
Gemma4 final fixes and multi-token think/tool start/end by @angeloskath in #1114
Align batch logits processor token contract by @neilmehta24 in #1115

New Contributors

@rltakashige made their first contribution in #960
@eyupcanakman made their first contribution in #967
@nwtgck made their first contribution in #987
@mm65x made their first contribution in #1003
@AndreasPlt made their first contribution in #1010
@arthurhjorth made their first contribution in #1008
@adurham made their first contribution in #1077
@LxYuan0420 made their first contribution in #1078
@matteocelani made their first contribution in #1083
@nicdavidson made their first contribution in #1105

Full Changelog: v0.31.0...v0.31.2