What's Changed
- Update version.txt after 0.14.1 release by @mrwyattii in #5413
- Remove dtype(fp16) condition check for residual_add unit test by @raza-sikander in #5329
- [XPU] Use non_daemonic_proc by default on XPU device by @ys950902 in #5412
- Fix a convergence issues in TP topology caused by incorrect grad_norm. by @inkcherry in #5411
- Update 'create-pr' action in release workflow to latest by @loadams in #5415
- Update engine.py to avoid torch warning by @etiennebonnafoux in #5408
- Update _sidebar.scss by @fasterinnerlooper in #5293
- Add more tests into XPU CI by @Liangliang-Ma in #5427
- [CPU] Support SHM based inference_all_reduce in TorchBackend by @delock in #5391
- Add required paths to trigger AMD tests on PRs by @loadams in #5406
- Bug fix in
split_index
method by @bm-synth in #5292 - Parallel map step for
DistributedDataAnalyzer
map-reduce by @bm-synth in #5291 - Selective dequantization by @RezaYazdaniAminabadi in #5375
- Fix sorting of shard optimizer states files for universal checkpoint by @tohtana in #5395
- add device config env for the accelerator by @shiyuan680 in #5396
- 64bit indexing fused adam by @garrett4wade in #5187
- Improve parallel process of universal checkpoint conversion by @tohtana in #5343
- set the default to use set_to_none for clearing gradients in BF16 optimizer. by @inkcherry in #5434
- OptimizedLinear implementation by @jeffra in #5355
- Update README.md by @Jhonso7393 in #5453
- Update PyTest torch version to match PyTorch latest official (2.3.0) by @loadams in #5454
New Contributors
- @etiennebonnafoux made their first contribution in #5408
- @fasterinnerlooper made their first contribution in #5293
- @shiyuan680 made their first contribution in #5396
- @garrett4wade made their first contribution in #5187
- @Jhonso7393 made their first contribution in #5453
Full Changelog: v0.14.1...v0.14.2