What's Changed
- fix bailing moe by @awni in #514
- Fix batching for models with nested cache structures by @kernelpool in #510
- Fix: Correct weight masking for zero-computation experts in LongCat Flash MoE by @kernelpool in #508
- Simplify to_lora to not hardcode model types by @awni in #515
- Add Olmo3 by @Goekdeniz-Guelmez in #445
- Make mixed quantization affect attention in DeepSeek V3, others by @n8sh1 in #506
- Add Apriel 1.5 by @ivanfioravanti in #520
- feat: Refactor granitemoehybrid to support dense and non-hybrid variants by @gabe-l-hart in #518
New Contributors
- @kernelpool made their first contribution in #510
Full Changelog: v0.28.1...v0.28.2