FSDPv2 support

This release introduces the support for FSDPv2 thanks to @S1ro1.

If you are using python code, you need to set fsdp_version=2 in FullyShardedDataParallelPlugin:

from accelerate import FullyShardedDataParallelPlugin, Accelerator

fsdp_plugin = FullyShardedDataParallelPlugin(
    fsdp_version=2
    # other options...
)
accelerator = Accelerator(fsdp_plugin=fsdp_plugin)

If want to convert a YAML config that contains the FSDPv1 config to FSDPv2 one , use our conversion tool:

accelerate to-fsdp2 --config_file config.yaml --output_file new_config.yaml`

To learn more about the difference between FSDPv1 and FSDPv2, read the following documentation.

DeepSpeed TP support

We have added initial support for DeepSpeed + TP. Not many changes were required as the DeepSpeed APIs was already compatible. We only needed to make sure that the dataloader was compatible with TP and that we were able to save the TP weights. Thanks @inkcherry for the work ! #3390.

To use TP with deepspeed, you need to update the setting in the deepspeed config file by including tensor_parallel key:

    ....
    "tensor_parallel":{
      "autotp_size": ${autotp_size}
    },
   ...

More details in this deepspeed PR.

Support for XCCL distributed backend

We've added support for XCCL which is an Intel distributed backend which can be used with XPU devices. More details in this torch PR. Thanks @dvrogozh for the integration !

What's Changed

Add log_artifact, log_artifacts and log_figure capabilities to the MLflowTracker. by @luiz0992 in #3419
tensor parallel dataloder for deepspeed accelerator by @inkcherry in #3390
Fix prod issues by @muellerzr in #3441
Fix attribute issue with deepspeed tp by @SunMarc in #3443
Fixed typo in the multi node FSDP slurm example script by @JacobB33 in #3447
feat: Add no_ssh and slurm multinode launcher options for deepspeed by @hsmallbone in #3329
Fixup ao module filter func by @muellerzr in #3450
remove device index workaround on xpu since xpu supports integer device index as cuda now by @yao-matrix in #3448
enable 2 UT cases on XPU by @yao-matrix in #3445
Fix AMD GPU support with should_reduce_batch_size() by @cameronshinn in #3405
Fix device KeyError in tied_params_map by @dvrogozh in #3403
Initial FSDP2 support by @S1ro1 in #3394
Fix: clip grad norm in fsdp2 by @S1ro1 in #3465
Update @ by @muellerzr in #3466
Fix seeding of new generator for multi GPU by @albertcthomas in #3459
Fix get_balanced_memory for MPS by @booxter in #3464
Update CometMLTracker to allow re-using experiment by @Lothiraldan in #3328
Apply ruff py39 fixes by @cyyever in #3461
xpu: enable xccl distributed backend by @dvrogozh in #3401
Update ruff target-version to py39 and apply more fixes by @cyyever in #3470
[MLU] fix deepspeed dependency by @huismiling in #3472
remove use_xpu to fix ut issues, we don't need this since XPU is OOB … by @yao-matrix in #3460
Bump ruff to 0.11.2 by @cyyever in #3471

New Contributors

@luiz0992 made their first contribution in #3419
@inkcherry made their first contribution in #3390
@JacobB33 made their first contribution in #3447
@hsmallbone made their first contribution in #3329
@yao-matrix made their first contribution in #3448
@cameronshinn made their first contribution in #3405
@S1ro1 made their first contribution in #3394
@albertcthomas made their first contribution in #3459
@booxter made their first contribution in #3464
@Lothiraldan made their first contribution in #3328
@cyyever made their first contribution in #3461

Full Changelog: v1.5.2...v1.6.0

huggingface/accelerate v1.6.0 v1.6.0: FSDPv2, DeepSpeed TP and XCCL backend support on GitHub

FSDPv2 support

DeepSpeed TP support

Support for XCCL distributed backend

What's Changed

New Contributors

huggingface/accelerate v1.6.0
v1.6.0: FSDPv2, DeepSpeed TP and XCCL backend support

on GitHub