microsoft/DeepSpeed v0.12.6 on GitHub

What's Changed

Update version.txt after 0.12.5 release by @mrwyattii in #4826
Cache metadata for TP activations and grads by @BacharL in #4360
Inference changes for incorporating meta loading checkpoint by @oelayan7 in #4692
Update CODEOWNERS by @mrwyattii in #4838
support baichuan model: by @baodii in #4721
inference engine: check if accelerator supports FP16 by @nelyahu in #4832
Update zeropp.md by @goodship1 in #4835
[NPU] load EXPORT_ENV based on different accelerators to support multi-node training on other devices by @minchao-sun in #4830
Add cuda_accelerator.py to triggers for A6000 test by @mrwyattii in #4848
Capture short kernel sequences to graph by @inkcherry in #4318
Checkpointing: Avoid assigning tensor storage with different device by @deepcharm in #4836
engine.py: remove unused _curr_save_path by @nelyahu in #4844
Mixtral FastGen Support by @cmikeh2 in #4828

Full Changelog: v0.12.5...v0.12.6