What's Changed
- Update version.txt after 0.14.4 release by @mrwyattii in #5694
- Fixed Windows inference build. by @costin-eseanu in #5609
- Fix memory leak from _hp_mapping by @chiragjn in #5643
- Bug fix for the "Link bit16 and fp32 parameters in partition" by @U-rara in #5681
- [CPU] add fp16 support to shm inference_all_reduce by @delock in #5669
- Universal checkpoint for zero stage 3 by @xylian86 in #5475
- inference unit test injectionPolicy split world_size to multiple tests by @oelayan7 in #5687
- ENV var added for recaching in INF Unit tests by @raza-sikander in #5688
- Disable nvtx decorator to avoid graph break by @tohtana in #5697
- Add an argument to enable the injection of missing state during the conversion of universal checkpoints by @xylian86 in #5608
- Change source of CPUAdam for xpu accelerator by @Liangliang-Ma in #5703
- Add additional paths to trigger xpu tests by @loadams in #5707
- Update XPU docker version by @loadams in #5712
- update xpu fusedadam opbuilder for pytorch 2.3 by @baodii in #5702
- DeepSpeed Universal Checkpointing: Blog and Tutorial by @samadejacobs in #5711
- UCP Chinese Blog by @HeyangQin in #5713
- Fix tutorial links by @samadejacobs in #5714
- Update node16 check on self-hosted runners and remove python 3.6 by @loadams in #5756
- fix the missing argument in test and typo by @xylian86 in #5730
- [INF] Enable torch compile for inference by @oelayan7 in #5612
- Update checkout action for nv-human-eval workflow by @loadams in #5757
- Add Windows scripts (deepspeed, ds_report). by @costin-eseanu in #5699
- Unit Test: Add error handling for rate limit exceeded in model list by @HeyangQin in #5715
- Fix memory leak for pipelined optimizer swapper by @mauryaavinash95 in #5700
- Remove duplicated variable by @xu-song in #5727
- Fix phi3 mini 128k load error by @Yejing-Lai in #5765
- [CPU] Allow deepspeed.comm.inference_all_reduce in torch.compile graph by @delock in #5604
- Added wrappers for hpu tensors based on dtype by @deepcharm in #5771
- [bugfix] promote state in bf16_optimizer by @billishyahao in #5767
- Launcher mode with SSH bypass by @dogacancolak-kensho in #5728
- Update the list of supported models in the Chinese README of fastgen by @beep-bebop in #5773
- Add support for Microsoft Phi-3 model to DeepSpeed-FastGen by @adk9 in #5559
- Misplaced global variable
warned
by @anferico in #5725 - Fixes for latest Huggingface_hub changes on modelId -> id by @loadams in #5789
- reduce all-to-all communication volume when both expert and non-expert are tensor-parallel by @taozhiwei in #5626
- Update Ubuntu version for running python tests by @loadams in #5783
- fix: quantization with DeepSpeed HE by @Atry in #5624
- [INF] Add Qwen2RMSNorm to loaded layers in auto_tp by @oelayan7 in #5786
- Add chatglm2 & chatglm3 autotp by @Yejing-Lai in #5540
- Add new autotp supported model in doc by @Yejing-Lai in #5785
- Fix accuracy error of NPUFusedAdam by @penn513 in #5777
- Update torch version in cpu-torch-latest and nv-torch-latest-v100 tests to 2.4 by @loadams in #5797
- move is_checkpointable call reducing torch.compile Graph breaks by @NirSonnenschein in #5759
- Unpin transformers version by @loadams in #5650
- Update other workflows to run on Ubuntu 22.04 by @loadams in #5798
- [XPU]Use host time to replace xpu time when IPEX version slower than 2.5. by @ys950902 in #5796
- Update MII tests to pull correct torchvision by @loadams in #5800
- Add fp8-fused gemm kernel by @sfc-gh-reyazda in #5764
- Add doc of compressed backend in Onebit optimizers by @Liangliang-Ma in #5782
- fix: handle exception when loading cache file in test_inference.py by @HeyangQin in #5802
- Pin transformers version for MII tests by @loadams in #5807
- Fix op_builder for CUDA 12.5 by @keshavkowshik in #5806
- Find ROCm on Fedora by @trixirt in #5705
- Fix CPU Adam JIT compilation by @lekurile in #5780
- GDS AIO Blog by @jomayeri in #5817
- [ROCm] Get rocm version from /opt/rocm/.info/version by @rraminen in #5815
- sequence parallel with communication overlap by @inkcherry in #5691
- Update to ROCm6 by @loadams in #5491
- Add fp16 support of Qwen1.5MoE models (A2.7B) to DeepSpeed-FastGen by @ZonePG in #5403
- Use accelerator to replace cuda in setup and runner by @Andy666G in #5769
- Link GDS blog to site by @tjruwase in #5820
- Non-reentrant checkpointing hook fix by @ic-synth in #5781
- Fix NV references by @tjruwase in #5821
- Fix docs building guide by @tjruwase in #5825
- Update clang-format version from 16 to 18. by @loadams in #5839
- Add Japanese translation of DeepNVMe blog by @tohtana in #5845
- Fix the bug of deepspeed sequence parallel working with batch size larger than 1 by @YJHMITWEB in #5823
- Upgrade HPU image to v1.16.2. by @vshekhawat-hlab in #5610
- OptimizedLinear updates by @jeffra in #5791
- Log operator warnings only in verbose mode by @tjruwase in #5917
- Use
torch.nan_to_num
replace numpy wrapper one by @jinyouzhi in #5877 - [Zero2] Reduce the unnecessary all-reduce when tensor size is 0. by @ys950902 in #5868
- Update container version for Gaudi2 CI by @raza-sikander in #5937
- Fix missing ds_id bug by @tjruwase in #5824
- Update LR scheduler configuration by @xiyang-aads-lilly in #5846
- HPUAccelerator: remove support in set_visible_devices_envs by @nelyahu in #5929
- Z3: optimizations for grad norm calculation and gradient clipping by @nelyahu in #5504
- Update xpu-max1100.yml with new config and add some tests by @Liangliang-Ma in #5668
- Add accelerator setup guides by @delock in #5827
- Allow accelerator to instantiate the device by @nelyahu in #5255
New Contributors
- @U-rara made their first contribution in #5681
- @xylian86 made their first contribution in #5475
- @mauryaavinash95 made their first contribution in #5700
- @billishyahao made their first contribution in #5767
- @dogacancolak-kensho made their first contribution in #5728
- @beep-bebop made their first contribution in #5773
- @anferico made their first contribution in #5725
- @Atry made their first contribution in #5624
- @sfc-gh-reyazda made their first contribution in #5764
- @keshavkowshik made their first contribution in #5806
- @trixirt made their first contribution in #5705
- @Andy666G made their first contribution in #5769
- @ic-synth made their first contribution in #5781
- @xiyang-aads-lilly made their first contribution in #5846
Full Changelog: v0.14.4...v0.14.5