pypi accelerate 1.14.0
v1.14.0: AMD ROCm support, FSDP2 hardening

9 hours ago

FSDP2 Improvements

This release brings a large batch of FSDP2 fixes and quality-of-life improvements: correct dtype handling on load, sharding of embeddings/norms, QLoRA crash prevention, and a more robust auto-wrap policy.

  • Fsdp2 fully_shard embedding and norm by @SunMarc in #4015
  • Fix fsdp2 load full state dict dtype mismatch by @SunMarc in #4021
  • Fix region compilation fsdpv2 by @SunMarc in #4022
  • [FSDP2] Cast model to uniform dtype before fully_shard to fix mixed-dtype AssertionError by @roycho96 in #3985
  • [FSDP2] Auto-exclude non-floating frozen Params4bit from fully_shard to prevent QLoRA crash by @roycho96 in #3987
  • fix(FSDP2): auto-wrap policy ignoring _no_split_modules fallback by @JohnGiorgi in #3999
  • fix: use key-based matching in fsdp2_load_full_state_dict by @roycho96 in #3982
  • fix: add missing model_has_params4bit guard to fsdp2_load_full_state_dict call by @roycho96 in #3981
  • Fix to-fsdp2: drop REMOVED / NOT_YET_IMPLEMENTED FSDP1 keys instead of leaking them by @lollinng in #4065
  • Prevent double-wrapping models in prepare_model() by @joshuaswanson in #3977

AMD ROCm support

Accelerate now works end-to-end on AMD ROCm devices. Thanks @Abdennacer-Badaoui!

Neuron

Further Neuron improvements to reduce recompilation and cover missing device cases.

Quantization & Offloading

We improved offloading support for quantized models, including Torchao, int8, and tied-weight handling.

Data Loading

  • Feat: Support dynamic batch size in BatchSamplerShard with even_batches by @yuxinyuan in #3969
  • Fix iterable dataset sharding condition when n_shards == num_processes by @SunMarc in #3958
  • Fix implicit padding in split_between_processes when apply_padding=False and num_samples < num_processes by @3manifold in #4052

Minor fixes

  • [DeepSpeed] allow kernels flash-attn in SP by @kashif in #3959
  • Fix: Conditionally import torch.distributed.algorithms.join in accelerator.py by @0xDELUXA in #3962
  • Fix is_hf_initialized attribute by @SunMarc in #3976
  • feat(utils): add max reduction type by @imstevenpmwork in #4027
  • fix(state): make MLU backend part of the _prepare_backend elif chain by @Anai-Guo in #4057
  • fix notebook launcher cuda init by @SunMarc in #4059
  • pytorch-triton-xpu rename to triton-xpu by @sywangyi in #4007
  • Relax numerical tolerance for XPU in test_big_modeling by @YangKai0616 in #4001
  • Fix gloo backend error in test_load_checkpoint_and_dispatch_with_broadcast on XPU by @kaixuanliu in #4056
  • Raise ValueError instead of a bare string in ParallelismConfig.get_device_mesh by @lollinng in #4064
  • tests: Gracefully handle missing set_device for mps by @booxter in #4028
  • test: add regression test for no_split_module_classes accepting set type by @UFO0506 in #4048
  • Fix all tests by @SunMarc in #4072
  • docs: add aggregate profiler memory example by @aryanputta in #4054
  • DOC: document missing parameters in load_accelerator_state, find_executable_batch_size, and send_to_device by @kratos0718 in #4051
  • docs: Fix docstring of fsdp2_prepare_auto_wrap_policy by @slocoro in #4037
  • Fix DistributedType documentation by @3manifold in #3980
  • Fix grammar, spelling, and consistency issues across docs and examples by @cihandemir in #3961
  • docs: fix typos in docstrings, comments, and user docs by @mokashang in #4040
  • chore: update doc-builder workflow SHA by @rtrompier in #4009
  • chore: bump doc-builder SHA for main doc build workflow by @rtrompier in #4018
  • [CI] Bump style-bot SHA + switch to GitHub App by @paulinebm in #4031
  • Fix TrackioTracker.log() ignoring step parameter by @joshuaswanson in #3975
  • fix: pass step parameter in TrackioTracker.log() by @liuyun7345 in #3970
  • fix(tracking): default step=None on tracker.log and accept extra kwargs in MLflowTracker by @1fanwang in #4039
  • Fix MLflowTracker.store_init_configuration mutating the caller's config dict by @ATOM00blue in #4046
  • fix(tracker): guard init_trackers and log against None kwargs by @xodn348 in #4026
  • 🔒 Pin GitHub Actions to commit SHAs by @paulinebm in #3992
  • chore: update build-docker-images-release.yml by @hf-security-analysis[bot] in #4069
  • chore: enable Dependabot weekly GitHub Actions bumps by @hf-dependantbot-rollout[bot] in #4049
  • Bump the actions group with 8 updates by @dependabot[bot] in #4068

Full Changelog: v1.13.0...v1.14.0

Don't miss a new accelerate release

NewReleases is sending notifications on new releases.