What's Changed
- Revert symmetric kl by @awni in #359
- Adding bailing_moe (ling-lite, -plus, -coder) by @Goekdeniz-Guelmez in #369
- Add SwanLab experiment tracking support for MLX by @ShaohonChen in #317
- Fix gpt-oss lora nan by @awni in #370
- Properly tie embeddings and lm head for gemma3 by @awni in #373
- Fix distributed evaluate by @angeloskath in #368
- Add SSE keepalive to stop client disconnects during prompt processing by @dysangel in #362
- Add LFM2-VL model implementation by @christian-lms in #378
- Adding trust_remote_code=True for training by @Goekdeniz-Guelmez in #383
- remove comma and add muon by @Goekdeniz-Guelmez in #381
- add into the lora to layer utils by @Goekdeniz-Guelmez in #382
- Make KL and JS metal kernels only if metal is available by @vsabolcec in #387
- fix sampling with small top k by @awni in #388
- fix window attention mask by @awni in #390
- Add support for ByteDance Seed-OSS-36B-Instruct model by @dnakov in #391
- Add Qwen2-VL model implementation by @vincentamato in #384
New Contributors
- @ShaohonChen made their first contribution in #317
- @dysangel made their first contribution in #362
- @dnakov made their first contribution in #391
- @vincentamato made their first contribution in #384
Full Changelog: v0.26.3...v0.26.4