microsoft/DeepSpeed v0.13.0 on GitHub

New Features

DeepSpeed-FastGen: Introducting Mixtral, Phi-2, and Falcon support with major performance and feature enhancements.

What's Changed

Update version.txt after 0.12.6 release by @mrwyattii in #4850
doc corrections by @goodship1 in #4861
Fix exception handling in get_all_ranks_from_group() function by @HeyangQin in #4862
deepspeed engine: fp16 support validation on init by @nelyahu in #4843
Remove hooks on gradient accumulation on engine/optimizer destroy by @chiragjn in #4858
optimize grad_norm calculation in stage3.py by @mmhab in #4436
Fix f-string messages by @li-plus in #4865
[NPU] Fix npu offload bug by @CurryRice233 in #4883
Partition parameters: Minor refactoring of use_secondary_tensor condition by @deepcharm in #4868
Pipeline: Add support to eval micro bs configuration by @nelyahu in #4859
zero_to_fp32.py: Handle a case where shape doesn't have numel attr by @nelyahu in #4842
Add support of Microsoft Phi-2 model to DeepSpeed-FastGen by @arashb in #4812
Support cpu tensors without direct device invocation by @abhilash1910 in #3842
add sharded loading for safetensors in AutoTP by @sywangyi in #4854
[XPU] XPU accelerator support for Intel GPU device by @delock in #4547
enable starcode((kv_head=1)) autotp by @Yejing-Lai in #4896
Release overlap_comm & contiguous_gradients restrictions for ZeRO 1 by @li-plus in #4887
[NPU]Add ZeRO-Infinity feature for NPU by @misstek in #4809
fix num_kv_heads sharding in uneven autoTP for Falcon-40b by @Yejing-Lai in #4712
Nvme offload checkpoint by @eisene in #4707
Add WarmupCosineLR to Read the Docs by @dwyatte in #4916
Add Habana Labs HPU accelerator support by @deepcharm in #4912
Unit tests for MiCS by @zarzen in #4792
Fix SD workflow to work with latest diffusers version by @lekurile in #4918
[Fix] Fix cpu inference UT failure by @delock in #4430
Add paths to run SD tests by @loadams in #4919
Change PR/schedule triggers for CPU-inference by @loadams in #4924
fix falcon-40b accuracy issue by @Yejing-Lai in #4895
Refactor the positional emebdding config code by @arashb in #4920
Pin to triton 2.1.0 to fix issues with nv-inference by @loadams in #4929
Add support of Qwen models (7b, 14b, 72b) to DeepSpeed-FastGen by @ZonePG in #4913
DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators by @nelyahu in #4833
Fix confusing width in simd_load by @yzhblind in #4714
Specify permissions for secrets.GITHUB_TOKEN by @mrwyattii in #4927
Enable quantizer op on ROCm by @rraminen in #4114
autoTP for Qwen by @inkcherry in #4902
Allow specifying mii branch for nv-a6000 workflow by @mrwyattii in #4936
Only run MII CI for inference changes by @mrwyattii in #4939
InfV2 - remove generation config requirement by @mrwyattii in #4938
Cache HF model list for inference tests by @mrwyattii in #4940
Fix docs inconsistency on default value for ignore_unused_parameters by @loadams in #4949
Fix bug in CI model caching by @mrwyattii in #4951
fix uneven issue & add balance autotp by @Yejing-Lai in #4697
Optimize preprocess for ragged batching by @tohtana in #4942
Fix bug where ZeRO2 never uses the reduce method. by @CurryRice233 in #4946
[docs] Add new autotp supported model in tutorial by @delock in #4960
Add missing op_builder.hpu component for HPU accelerator by @nelyahu in #4963
Stage_1_and_2.py: fix assert for reduce_scatter configurations combinations by @nelyahu in #4964
[MiCS]Add the path to support sequence_data_parallel on MiCS by @ys950902 in #4926
Update the DeepSpeed Phi-2 impl. to work with the HF latest changes by @arashb in #4950
Prevent infinite recursion when DS_ACCELERATOR is set to cuda by @ShukantPal in #4962
Fixes for training models with bf16 + freshly initialized optimizer via load_module_only by @haileyschoelkopf in #4141
params partition for skip_init by @inkcherry in #4722
Enhance query APIs for text generation by @tohtana in #4965
Add API to set a module as a leaf node when recursively setting Z3 hooks by @tohtana in #4966
Fix T5 and mistral model meta data error by @Yejing-Lai in #4958
FastGen Jan 2024 blog by @mrwyattii in #4980

New Contributors

@chiragjn made their first contribution in #4858
@li-plus made their first contribution in #4865
@misstek made their first contribution in #4809
@dwyatte made their first contribution in #4916
@ZonePG made their first contribution in #4913
@yzhblind made their first contribution in #4714
@ShukantPal made their first contribution in #4962
@haileyschoelkopf made their first contribution in #4141

Full Changelog: v0.12.6...v0.13.0

microsoft/DeepSpeed v0.13.0 DeepSpeed v0.13.0 on GitHub

New Features

What's Changed

New Contributors

microsoft/DeepSpeed v0.13.0
DeepSpeed v0.13.0

on GitHub