New Features
What's Changed
- Update version.txt after 0.12.6 release by @mrwyattii in #4850
- doc corrections by @goodship1 in #4861
- Fix exception handling in get_all_ranks_from_group() function by @HeyangQin in #4862
- deepspeed engine: fp16 support validation on init by @nelyahu in #4843
- Remove hooks on gradient accumulation on engine/optimizer destroy by @chiragjn in #4858
- optimize grad_norm calculation in stage3.py by @mmhab in #4436
- Fix f-string messages by @li-plus in #4865
- [NPU] Fix npu offload bug by @CurryRice233 in #4883
- Partition parameters: Minor refactoring of use_secondary_tensor condition by @deepcharm in #4868
- Pipeline: Add support to eval micro bs configuration by @nelyahu in #4859
- zero_to_fp32.py: Handle a case where shape doesn't have numel attr by @nelyahu in #4842
- Add support of Microsoft Phi-2 model to DeepSpeed-FastGen by @arashb in #4812
- Support cpu tensors without direct device invocation by @abhilash1910 in #3842
- add sharded loading for safetensors in AutoTP by @sywangyi in #4854
- [XPU] XPU accelerator support for Intel GPU device by @delock in #4547
- enable starcode((kv_head=1)) autotp by @Yejing-Lai in #4896
- Release overlap_comm & contiguous_gradients restrictions for ZeRO 1 by @li-plus in #4887
- [NPU]Add ZeRO-Infinity feature for NPU by @misstek in #4809
- fix num_kv_heads sharding in uneven autoTP for Falcon-40b by @Yejing-Lai in #4712
- Nvme offload checkpoint by @eisene in #4707
- Add WarmupCosineLR to Read the Docs by @dwyatte in #4916
- Add Habana Labs HPU accelerator support by @deepcharm in #4912
- Unit tests for MiCS by @zarzen in #4792
- Fix SD workflow to work with latest diffusers version by @lekurile in #4918
- [Fix] Fix cpu inference UT failure by @delock in #4430
- Add paths to run SD tests by @loadams in #4919
- Change PR/schedule triggers for CPU-inference by @loadams in #4924
- fix falcon-40b accuracy issue by @Yejing-Lai in #4895
- Refactor the positional emebdding config code by @arashb in #4920
- Pin to triton 2.1.0 to fix issues with nv-inference by @loadams in #4929
- Add support of Qwen models (7b, 14b, 72b) to DeepSpeed-FastGen by @ZonePG in #4913
- DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators by @nelyahu in #4833
- Fix confusing width in simd_load by @yzhblind in #4714
- Specify permissions for secrets.GITHUB_TOKEN by @mrwyattii in #4927
- Enable quantizer op on ROCm by @rraminen in #4114
- autoTP for Qwen by @inkcherry in #4902
- Allow specifying mii branch for nv-a6000 workflow by @mrwyattii in #4936
- Only run MII CI for inference changes by @mrwyattii in #4939
- InfV2 - remove generation config requirement by @mrwyattii in #4938
- Cache HF model list for inference tests by @mrwyattii in #4940
- Fix docs inconsistency on default value for
ignore_unused_parameters
by @loadams in #4949 - Fix bug in CI model caching by @mrwyattii in #4951
- fix uneven issue & add balance autotp by @Yejing-Lai in #4697
- Optimize preprocess for ragged batching by @tohtana in #4942
- Fix bug where ZeRO2 never uses the reduce method. by @CurryRice233 in #4946
- [docs] Add new autotp supported model in tutorial by @delock in #4960
- Add missing op_builder.hpu component for HPU accelerator by @nelyahu in #4963
- Stage_1_and_2.py: fix assert for reduce_scatter configurations combinations by @nelyahu in #4964
- [MiCS]Add the path to support sequence_data_parallel on MiCS by @ys950902 in #4926
- Update the DeepSpeed Phi-2 impl. to work with the HF latest changes by @arashb in #4950
- Prevent infinite recursion when DS_ACCELERATOR is set to cuda by @ShukantPal in #4962
- Fixes for training models with bf16 + freshly initialized optimizer via
load_module_only
by @haileyschoelkopf in #4141 - params partition for skip_init by @inkcherry in #4722
- Enhance query APIs for text generation by @tohtana in #4965
- Add API to set a module as a leaf node when recursively setting Z3 hooks by @tohtana in #4966
- Fix T5 and mistral model meta data error by @Yejing-Lai in #4958
- FastGen Jan 2024 blog by @mrwyattii in #4980
New Contributors
- @chiragjn made their first contribution in #4858
- @li-plus made their first contribution in #4865
- @misstek made their first contribution in #4809
- @dwyatte made their first contribution in #4916
- @ZonePG made their first contribution in #4913
- @yzhblind made their first contribution in #4714
- @ShukantPal made their first contribution in #4962
- @haileyschoelkopf made their first contribution in #4141
Full Changelog: v0.12.6...v0.13.0