What's Changed
- Update version.txt after 0.12.5 release by @mrwyattii in #4826
- Cache metadata for TP activations and grads by @BacharL in #4360
- Inference changes for incorporating meta loading checkpoint by @oelayan7 in #4692
- Update CODEOWNERS by @mrwyattii in #4838
- support baichuan model: by @baodii in #4721
- inference engine: check if accelerator supports FP16 by @nelyahu in #4832
- Update zeropp.md by @goodship1 in #4835
- [NPU] load EXPORT_ENV based on different accelerators to support multi-node training on other devices by @minchao-sun in #4830
- Add cuda_accelerator.py to triggers for A6000 test by @mrwyattii in #4848
- Capture short kernel sequences to graph by @inkcherry in #4318
- Checkpointing: Avoid assigning tensor storage with different device by @deepcharm in #4836
- engine.py: remove unused _curr_save_path by @nelyahu in #4844
- Mixtral FastGen Support by @cmikeh2 in #4828
New Contributors
- @minchao-sun made their first contribution in #4830
Full Changelog: v0.12.5...v0.12.6