What's Changed
- Update version.txt after 0.15.0 release by @loadams in #6403
- Fix Type Mismatch by @jomayeri in #6410
- Fix redundant seq data parallel grp argument in Z3/MiCS by @samadejacobs in #5352
- add Huawei Ascend NPU setup guide by @xuedinge233 in #6445
- Add documentation for launcher without SSH by @dogacancolak-kensho in #6455
- Dtype support check for accelerator in UTs by @raza-sikander in #6360
- Store/Load CIFAR from local/offline by @raza-sikander in #6390
- Add the accelerator setup guide link in Getting Started page by @rogerxfeng8 in #6452
- Allow triton==3.0.x for fp_quantizer by @siddartha-RE in #6447
- Change GDS to 1 AIO thread by @jomayeri in #6459
- [CCL] fix condition issue in ccl.py by @YizhouZ in #6443
- Avoid gds build errors on ROCm by @rraminen in #6456
- TestLowCpuMemUsage UT get device by device_name by @raza-sikander in #6397
- Add workflow to build DS without torch to better test before releases by @loadams in #6450
- Fix patch for parameter partitioning in zero.Init() by @tohtana in #6388
- Add default value to "checkpoint_folder" in "load_state_dict" of bf16_optimizer by @ljcc0930 in #6446
- DeepNVMe tutorial by @tjruwase in #6449
- bf16_optimizer: fixes to different grad acc dtype by @nelyahu in #6485
- print warning if actual triton cache dir is on NFS, not just for default by @jrandall in #6487
- DS_BUILD_OPS should build only compatible ops by @tjruwase in #6489
- Safe usage of popen by @tjruwase in #6490
- Handle an edge case where
CUDA_HOME
is not defined on ROCm systems by @amorehead in #6488
New Contributors
- @xuedinge233 made their first contribution in #6445
- @siddartha-RE made their first contribution in #6447
- @ljcc0930 made their first contribution in #6446
- @jrandall made their first contribution in #6487
- @amorehead made their first contribution in #6488
Full Changelog: v0.15.0...v0.15.1