ml-explore/mlx-lm v0.28.2 on GitHub

What's Changed

fix bailing moe by @awni in #514
Fix batching for models with nested cache structures by @kernelpool in #510
Fix: Correct weight masking for zero-computation experts in LongCat Flash MoE by @kernelpool in #508
Simplify to_lora to not hardcode model types by @awni in #515
Add Olmo3 by @Goekdeniz-Guelmez in #445
Make mixed quantization affect attention in DeepSeek V3, others by @n8sh1 in #506
Add Apriel 1.5 by @ivanfioravanti in #520
feat: Refactor granitemoehybrid to support dense and non-hybrid variants by @gabe-l-hart in #518

Full Changelog: v0.28.1...v0.28.2