ml-explore/mlx-lm v0.28.0 on GitHub

What's Changed

Allow fp8 by @awni in #431
Avoid cache-trimming crash in server for longcat chat and baichuan_m1 by @n8sh1 in #434
Fix hunyuan v1 dense by @awni in #440
Changes needed to facilitate batching by @awni in #430
remove manual conv class in mamba1 by @Goekdeniz-Guelmez in #436
adding Kwai-Klear/Klear-46B-A2.5B-Instruct by @Goekdeniz-Guelmez in #437
Add lille 130m by @Goekdeniz-Guelmez in #429
model: GraniteMoeHybrid by @gabe-l-hart in #442
fix server paths by @awni in #448
sdpa with sinks by @awni in #418
fix(quantization): Parameterize hardcoded group_size in mixed_quant_predicate_builder by @squaredice in #449
Adding Ling Mini by @Goekdeniz-Guelmez in #450
Adding Qwen3 Next by @Goekdeniz-Guelmez in #441
Faster ssm by @awni in #451
Update bitnet, nemotron h to use build in relu2 from MLX by @Goekdeniz-Guelmez in #446
fix qwen3 next by @Goekdeniz-Guelmez in #453
Adding GLM by @Goekdeniz-Guelmez in #457
Add an introduction to the default LLM in README.md by @aopstudio in #461
Fix TypeError: Model.__call__() got an unexpected keyword argument 'mask' for qwen2_vl, mistral3 by @neilmehta24 in #464
Add groups to ssm kernel and update more models by @awni in #456
Fix gemma3 window mask by @awni in #465
Batch generation by @awni in #443
Batch support for mamba-style models by @awni in #468
fix: handle cache offset safely for mamba error by @ivanfioravanti in #472
Adds LLaMA 4 text model implementation in MLX by @robbiemu in #469
Allow sampler to work with batched_generate by @N8python in #473
Adding support for mamba2 by @Goekdeniz-Guelmez in #392
Fix llama4 text and make trainable by @Goekdeniz-Guelmez in #474
Extends quantization predicate with config by @robbiemu in #476
Gated-Delta Fused Kernel (Qwen3Next) by @ivanfioravanti in #454

New Contributors

@gabe-l-hart made their first contribution in #442
@squaredice made their first contribution in #449
@aopstudio made their first contribution in #461
@robbiemu made their first contribution in #469

Full Changelog: v0.27.1...v0.28.0