FSDPv2 support
This release introduces the support for FSDPv2 thanks to @S1ro1.
If you are using python code, you need to set fsdp_version=2 in FullyShardedDataParallelPlugin:
from accelerate import FullyShardedDataParallelPlugin, Accelerator
fsdp_plugin = FullyShardedDataParallelPlugin(
fsdp_version=2
# other options...
)
accelerator = Accelerator(fsdp_plugin=fsdp_plugin)If want to convert a YAML config that contains the FSDPv1 config to FSDPv2 one , use our conversion tool:
accelerate to-fsdp2 --config_file config.yaml --output_file new_config.yaml`
To learn more about the difference between FSDPv1 and FSDPv2, read the following documentation.
DeepSpeed TP support
We have added initial support for DeepSpeed + TP. Not many changes were required as the DeepSpeed APIs was already compatible. We only needed to make sure that the dataloader was compatible with TP and that we were able to save the TP weights. Thanks @inkcherry for the work ! #3390.
To use TP with deepspeed, you need to update the setting in the deepspeed config file by including tensor_parallel key:
....
"tensor_parallel":{
"autotp_size": ${autotp_size}
},
...
More details in this deepspeed PR.
Support for XCCL distributed backend
We've added support for XCCL which is an Intel distributed backend which can be used with XPU devices. More details in this torch PR. Thanks @dvrogozh for the integration !
What's Changed
- Add
log_artifact,log_artifactsandlog_figurecapabilities to the MLflowTracker. by @luiz0992 in #3419 - tensor parallel dataloder for deepspeed accelerator by @inkcherry in #3390
- Fix prod issues by @muellerzr in #3441
- Fix attribute issue with deepspeed tp by @SunMarc in #3443
- Fixed typo in the multi node FSDP slurm example script by @JacobB33 in #3447
- feat: Add no_ssh and slurm multinode launcher options for deepspeed by @hsmallbone in #3329
- Fixup ao module filter func by @muellerzr in #3450
- remove device index workaround on xpu since xpu supports integer device index as cuda now by @yao-matrix in #3448
- enable 2 UT cases on XPU by @yao-matrix in #3445
- Fix AMD GPU support with should_reduce_batch_size() by @cameronshinn in #3405
- Fix device KeyError in tied_params_map by @dvrogozh in #3403
- Initial FSDP2 support by @S1ro1 in #3394
- Fix: clip grad norm in fsdp2 by @S1ro1 in #3465
- Update @ by @muellerzr in #3466
- Fix seeding of new generator for multi GPU by @albertcthomas in #3459
- Fix get_balanced_memory for MPS by @booxter in #3464
- Update CometMLTracker to allow re-using experiment by @Lothiraldan in #3328
- Apply ruff py39 fixes by @cyyever in #3461
- xpu: enable xccl distributed backend by @dvrogozh in #3401
- Update ruff target-version to py39 and apply more fixes by @cyyever in #3470
- [MLU] fix deepspeed dependency by @huismiling in #3472
- remove use_xpu to fix ut issues, we don't need this since XPU is OOB … by @yao-matrix in #3460
- Bump ruff to 0.11.2 by @cyyever in #3471
New Contributors
- @luiz0992 made their first contribution in #3419
- @inkcherry made their first contribution in #3390
- @JacobB33 made their first contribution in #3447
- @hsmallbone made their first contribution in #3329
- @yao-matrix made their first contribution in #3448
- @cameronshinn made their first contribution in #3405
- @S1ro1 made their first contribution in #3394
- @albertcthomas made their first contribution in #3459
- @booxter made their first contribution in #3464
- @Lothiraldan made their first contribution in #3328
- @cyyever made their first contribution in #3461
Full Changelog: v1.5.2...v1.6.0