What's Changed
- Allow fp8 by @awni in #431
- Avoid cache-trimming crash in server for longcat chat and baichuan_m1 by @n8sh1 in #434
- Fix hunyuan v1 dense by @awni in #440
- Changes needed to facilitate batching by @awni in #430
- remove manual conv class in mamba1 by @Goekdeniz-Guelmez in #436
- adding Kwai-Klear/Klear-46B-A2.5B-Instruct by @Goekdeniz-Guelmez in #437
- Add lille 130m by @Goekdeniz-Guelmez in #429
- model: GraniteMoeHybrid by @gabe-l-hart in #442
- fix server paths by @awni in #448
- sdpa with sinks by @awni in #418
- fix(quantization): Parameterize hardcoded group_size in mixed_quant_predicate_builder by @squaredice in #449
- Adding Ling Mini by @Goekdeniz-Guelmez in #450
- Adding Qwen3 Next by @Goekdeniz-Guelmez in #441
- Faster ssm by @awni in #451
- Update bitnet, nemotron h to use build in relu2 from MLX by @Goekdeniz-Guelmez in #446
- fix qwen3 next by @Goekdeniz-Guelmez in #453
- Adding GLM by @Goekdeniz-Guelmez in #457
- Add an introduction to the default LLM in README.md by @aopstudio in #461
- Fix
TypeError: Model.__call__() got an unexpected keyword argument 'mask'
for qwen2_vl, mistral3 by @neilmehta24 in #464 - Add groups to ssm kernel and update more models by @awni in #456
- Fix gemma3 window mask by @awni in #465
- Batch generation by @awni in #443
- Batch support for mamba-style models by @awni in #468
- fix: handle cache offset safely for mamba error by @ivanfioravanti in #472
- Adds LLaMA 4 text model implementation in MLX by @robbiemu in #469
- Allow sampler to work with batched_generate by @N8python in #473
- Adding support for mamba2 by @Goekdeniz-Guelmez in #392
- Fix llama4 text and make trainable by @Goekdeniz-Guelmez in #474
- Extends quantization predicate with config by @robbiemu in #476
- Gated-Delta Fused Kernel (Qwen3Next) by @ivanfioravanti in #454
New Contributors
- @gabe-l-hart made their first contribution in #442
- @squaredice made their first contribution in #449
- @aopstudio made their first contribution in #461
- @robbiemu made their first contribution in #469
Full Changelog: v0.27.1...v0.28.0