microsoft/DeepSpeed v0.14.5 on GitHub

What's Changed

Update version.txt after 0.14.4 release by @mrwyattii in #5694
Fixed Windows inference build. by @costin-eseanu in #5609
Fix memory leak from _hp_mapping by @chiragjn in #5643
Bug fix for the "Link bit16 and fp32 parameters in partition" by @U-rara in #5681
[CPU] add fp16 support to shm inference_all_reduce by @delock in #5669
Universal checkpoint for zero stage 3 by @xylian86 in #5475
inference unit test injectionPolicy split world_size to multiple tests by @oelayan7 in #5687
ENV var added for recaching in INF Unit tests by @raza-sikander in #5688
Disable nvtx decorator to avoid graph break by @tohtana in #5697
Add an argument to enable the injection of missing state during the conversion of universal checkpoints by @xylian86 in #5608
Change source of CPUAdam for xpu accelerator by @Liangliang-Ma in #5703
Add additional paths to trigger xpu tests by @loadams in #5707
Update XPU docker version by @loadams in #5712
update xpu fusedadam opbuilder for pytorch 2.3 by @baodii in #5702
DeepSpeed Universal Checkpointing: Blog and Tutorial by @samadejacobs in #5711
UCP Chinese Blog by @HeyangQin in #5713
Fix tutorial links by @samadejacobs in #5714
Update node16 check on self-hosted runners and remove python 3.6 by @loadams in #5756
fix the missing argument in test and typo by @xylian86 in #5730
[INF] Enable torch compile for inference by @oelayan7 in #5612
Update checkout action for nv-human-eval workflow by @loadams in #5757
Add Windows scripts (deepspeed, ds_report). by @costin-eseanu in #5699
Unit Test: Add error handling for rate limit exceeded in model list by @HeyangQin in #5715
Fix memory leak for pipelined optimizer swapper by @mauryaavinash95 in #5700
Remove duplicated variable by @xu-song in #5727
Fix phi3 mini 128k load error by @Yejing-Lai in #5765
[CPU] Allow deepspeed.comm.inference_all_reduce in torch.compile graph by @delock in #5604
Added wrappers for hpu tensors based on dtype by @deepcharm in #5771
[bugfix] promote state in bf16_optimizer by @billishyahao in #5767
Launcher mode with SSH bypass by @dogacancolak-kensho in #5728
Update the list of supported models in the Chinese README of fastgen by @beep-bebop in #5773
Add support for Microsoft Phi-3 model to DeepSpeed-FastGen by @adk9 in #5559
Misplaced global variable warned by @anferico in #5725
Fixes for latest Huggingface_hub changes on modelId -> id by @loadams in #5789
reduce all-to-all communication volume when both expert and non-expert are tensor-parallel by @taozhiwei in #5626
Update Ubuntu version for running python tests by @loadams in #5783
fix: quantization with DeepSpeed HE by @Atry in #5624
[INF] Add Qwen2RMSNorm to loaded layers in auto_tp by @oelayan7 in #5786
Add chatglm2 & chatglm3 autotp by @Yejing-Lai in #5540
Add new autotp supported model in doc by @Yejing-Lai in #5785
Fix accuracy error of NPUFusedAdam by @penn513 in #5777
Update torch version in cpu-torch-latest and nv-torch-latest-v100 tests to 2.4 by @loadams in #5797
move is_checkpointable call reducing torch.compile Graph breaks by @NirSonnenschein in #5759
Unpin transformers version by @loadams in #5650
Update other workflows to run on Ubuntu 22.04 by @loadams in #5798
[XPU]Use host time to replace xpu time when IPEX version slower than 2.5. by @ys950902 in #5796
Update MII tests to pull correct torchvision by @loadams in #5800
Add fp8-fused gemm kernel by @sfc-gh-reyazda in #5764
Add doc of compressed backend in Onebit optimizers by @Liangliang-Ma in #5782
fix: handle exception when loading cache file in test_inference.py by @HeyangQin in #5802
Pin transformers version for MII tests by @loadams in #5807
Fix op_builder for CUDA 12.5 by @keshavkowshik in #5806
Find ROCm on Fedora by @trixirt in #5705
Fix CPU Adam JIT compilation by @lekurile in #5780
GDS AIO Blog by @jomayeri in #5817
[ROCm] Get rocm version from /opt/rocm/.info/version by @rraminen in #5815
sequence parallel with communication overlap by @inkcherry in #5691
Update to ROCm6 by @loadams in #5491
Add fp16 support of Qwen1.5MoE models (A2.7B) to DeepSpeed-FastGen by @ZonePG in #5403
Use accelerator to replace cuda in setup and runner by @Andy666G in #5769
Link GDS blog to site by @tjruwase in #5820
Non-reentrant checkpointing hook fix by @ic-synth in #5781
Fix NV references by @tjruwase in #5821
Fix docs building guide by @tjruwase in #5825
Update clang-format version from 16 to 18. by @loadams in #5839
Add Japanese translation of DeepNVMe blog by @tohtana in #5845
Fix the bug of deepspeed sequence parallel working with batch size larger than 1 by @YJHMITWEB in #5823
Upgrade HPU image to v1.16.2. by @vshekhawat-hlab in #5610
OptimizedLinear updates by @jeffra in #5791
Log operator warnings only in verbose mode by @tjruwase in #5917
Use torch.nan_to_num replace numpy wrapper one by @jinyouzhi in #5877
[Zero2] Reduce the unnecessary all-reduce when tensor size is 0. by @ys950902 in #5868
Update container version for Gaudi2 CI by @raza-sikander in #5937
Fix missing ds_id bug by @tjruwase in #5824
Update LR scheduler configuration by @xiyang-aads-lilly in #5846
HPUAccelerator: remove support in set_visible_devices_envs by @nelyahu in #5929
Z3: optimizations for grad norm calculation and gradient clipping by @nelyahu in #5504
Update xpu-max1100.yml with new config and add some tests by @Liangliang-Ma in #5668
Add accelerator setup guides by @delock in #5827
Allow accelerator to instantiate the device by @nelyahu in #5255

New Contributors

@U-rara made their first contribution in #5681
@xylian86 made their first contribution in #5475
@mauryaavinash95 made their first contribution in #5700
@billishyahao made their first contribution in #5767
@dogacancolak-kensho made their first contribution in #5728
@beep-bebop made their first contribution in #5773
@anferico made their first contribution in #5725
@Atry made their first contribution in #5624
@sfc-gh-reyazda made their first contribution in #5764
@keshavkowshik made their first contribution in #5806
@trixirt made their first contribution in #5705
@Andy666G made their first contribution in #5769
@ic-synth made their first contribution in #5781
@xiyang-aads-lilly made their first contribution in #5846

Full Changelog: v0.14.4...v0.14.5

microsoft/DeepSpeed v0.14.5 v0.14.5 Patch release on GitHub

What's Changed

New Contributors

microsoft/DeepSpeed v0.14.5
v0.14.5 Patch release

on GitHub