pypi accelerate 1.13.0
v1.13.0: Neuron support, IPEX removal, and distributed training fixes

8 hours ago

AWS Neuron support

We now have support for AWS Neuron (Trainium/Inferentia) devices. Thanks @michaelbenayoun for adding this.

XPU Improvements

We've removed IPEX dependency and improved device-agnostic code for XPU.

FSDP2 Improvements

We've added a bunch of important fixes for FSDP2 users: upcasting only grad-requiring params, better tied embedding errors, DCP optimizer loading, bf16 optimizer step crash fix, and torch < 2.7.0 compatibility.

  • Upcast FSDP2 parameters only if requires_grad by @ojh31 in #3848
  • Fix FSDP2 tied embedding errors with targeted ValueError guidance by @amanzoni1 in #3878
  • bug: fsdp cannot load optimizer state using dcp by @flymin in #3904
  • fix crash in optimizer.step when fsdp2 is enabled and model is bfloat16 by @sywangyi in #3905
  • Fix FSDP2 crash with ignored_params on torch < 2.7.0 by @Mr-Neutr0n in #3924

DeepSpeed Sequence Parallelism

We've added several fixes to the DeepSpeed + Sequence Parallelism integration introduced in v1.12.0, including evaluation support during SP training and proper process group handling.

  • [SP] fix loss computation example by @kashif in #3858
  • [SP and CP] error out if both CP and SP enabled by @kashif in #3862
  • DeepSpeed has its own process group by @kashif in #3916
  • [Deepspeed] skip device mesh creation when deepspeed and sp_size >1 by @kashif in #3914
  • Enable evaluation during deepspeed Sequence Parallel by @jp1924 in #3917

FP8

We've enhanced FP8 training. Thanks @shimizust for fixing torchao support.

Performance

Accelerate now imports faster by deferring heavy dependencies, and torch.compile hooks are disabled lazily.

Minor fixes

Don't miss a new accelerate release

NewReleases is sending notifications on new releases.