What's Changed
- Removing the deprecated wandb params by @Goekdeniz-Guelmez in #524
- Memory efficient ssm by @awni in #525
- Fix lora MoEs by @awni in #522
- Fix bailing moe by @awni in #521
- GPT2 Batching Fix by @shepardxia in #529
- Remove act loss and add temp in DWQ by @awni in #500
- Cleanup and simplify model I/O by @awni in #532
- Fix: Add future annotations import to qwen3_next.py for Python 3.9 compatibility by @mzau in #533
- Add lfm2 moe by @Blaizzy in #537
- Fix example command to quantize a model using GPTQ by @felladrin in #539
- minor typing issues by @mercush in #540
- Fix cuda install by @awni in #542
- Add Qwen3-VL language model implementation by @vincentamato in #547
- Fix mask for batched SSM by @awni in #546
- Added gradient accumulation to training loop by @dotvignesh in #511
- Support data parallel eval for generation tasks by @awni in #549
- Optimize Bailing MoE by @kernelpool in #550
- Adding jamba by @Goekdeniz-Guelmez in #544
- Add Qwen3-VL (Dense) language model implementation by @vincentamato in #553
- LLM Benchmarks by @awni in #552
- Add support for nanochat by @dnakov in #554
- version by @awni in #558
New Contributors
- @shepardxia made their first contribution in #529
- @mzau made their first contribution in #533
- @felladrin made their first contribution in #539
- @mercush made their first contribution in #540
- @dotvignesh made their first contribution in #511
Full Changelog: v0.28.2...v0.28.3