Highlights
- Caching system prompt and user messages for non-trimmable caches
- Batch generator refactoring
What's Changed
- Bump the patch version by @angeloskath in #959
- Presence and frequency penalties by @angeloskath in #971
- Eval self.left_padding whenever it is updated in BatchRotatingKVCache by @rltakashige in #960
- Late binding caused incorrect cache checkpoint by @angeloskath in #976
- Move to metal agnostic device_info by @angeloskath in #979
- Fix CompletionsDataset mask_prompt crash by @eyupcanakman in #967
- Bump the patch version by @angeloskath in #981
- Fix test after latest MLX update by @angeloskath in #996
- Clear cache trainer memory by @N8python in #986
- feat(server): add --allowed-origins by @nwtgck in #987
- Delta net precision by @angeloskath in #997
- avoid mutating input in SuScaledRoPE and YarnRoPE by @mm65x in #1003
- handle missing content-length header in server by @mm65x in #1001
- fall back to ast.literal_eval for malformed JSON in qwen3_coder tool parser by @mm65x in #1004
- Nemotron super support by @angeloskath in #992
- Supporting delay in mlx_lm benchmark by @AndreasPlt in #1010
- Fix flaky test by @angeloskath in #1020
- Fix missing cache advance from qwen 3.5 by @angeloskath in #1024
- Refactor LRUPromptCache by @angeloskath in #1019
- Fix SSM dt clamp default for Nemotron-H by @kernelpool in #1026
- Inserting logits processors into BatchGenerator in batch_generate by @arthurhjorth in #1008
- fix: break shared-buffer memory leak in GatedDeltaNet cache by @adurham in #1077
- Fix PromptTrie.pop_prefixes() off-by-one when pruning immediate prefixes by @LxYuan0420 in #1078
- Batch generation refactoring and various fixes by @angeloskath in #1072
- perf: use max instead of argsort in apply_min_p sampling by @matteocelani in #1083
- Add gemma 4 by @Blaizzy in #1093
- Bring back max-kv-size to the batch generator by @angeloskath in #1106
- Add Gemma 4 tool call parser by @nicdavidson in #1105
- Fix Gemma 4 quantized per-layer projection loading by @spicyneuron in #1112
- Fix output corruption in speculative decoding by @kernelpool in #1109
- Gemma4 final fixes and multi-token think/tool start/end by @angeloskath in #1114
- Align batch logits processor token contract by @neilmehta24 in #1115
New Contributors
- @rltakashige made their first contribution in #960
- @eyupcanakman made their first contribution in #967
- @nwtgck made their first contribution in #987
- @mm65x made their first contribution in #1003
- @AndreasPlt made their first contribution in #1010
- @arthurhjorth made their first contribution in #1008
- @adurham made their first contribution in #1077
- @LxYuan0420 made their first contribution in #1078
- @matteocelani made their first contribution in #1083
- @nicdavidson made their first contribution in #1105
Full Changelog: v0.31.0...v0.31.2