accelerate 0.25.0 on Python PyPI

Safetensors default

As of this release, safetensors will be the default format saved when applicable! To read more about safetensors and why it's best to use it for safety (and not pickle/torch.save), check it out here

New Experiment Trackers

This release has two new experiment trackers, ClearML and DVCLive!

To use them, just pass clear_ml or dvclive to log_with in the Accelerator init. h/t to @eugen-ajechiloae-clearml and @dberenbaum

DeepSpeed

Accelerate's DeepSpeed integration now supports NPU devices, h/t to @statelesshz
DeepSpeed can now be launched via accelerate on single GPU setups

FSDP

FSDP had a huge refactoring so that the interface when using FSDP is the exact same as every other scenario when using accelerate. No more needing to call accelerator.prepare() twice!

Other useful enhancements

We now raise and try to disable P2P communications on consumer GPUs for the 3090 series and beyond. Without this users were seeing timeout issues and the like as NVIDIA dropped P2P support. If using accelerate launch we will automatically disable, and if we sense that it is still enabled on distributed setups using 3090's +, we will raise an error.
When doing .gather(), if tensors are on different devices we explicitly will raise an error (for now only valid on CUDA)

Bug fixes

Fixed a bug that caused dataloaders to not shuffle despite shuffle=True when using multiple GPUs and the new SeedableRandomSampler.

General Changelog

Add logs offloading by @SunMarc in #2075
Add ClearML tracker by @eugen-ajechiloae-clearml in #2034
CRITICAL: fix failing ci by @muellerzr in #2088
Fix flag typo by @kuza55 in #2090
Fix batch sampler by @muellerzr in #2097
fixed ip address typo by @Fluder-Paradyne in #2099
Fix memory leak in fp8 causing OOM (and potentially 3x vRAM usage) by @muellerzr in #2089
fix warning when offload by @SunMarc in #2105
Always use SeedableRandomSampler by @muellerzr in #2110
Fix issue with tests by @muellerzr in #2111
Make SeedableRandomSampler the default always by @muellerzr in #2117
Use "and" instead of comma in Bibtex citation by @qgallouedec in #2119
Add explicit error if empty batch received by @YuryYakhno in #2115
Allow for ACCELERATE_SEED env var by @muellerzr in #2126
add DeepSpeed support for NPU by @statelesshz in #2054
Sync states for npu fsdp by @jq460494839 in #2113
Fix import error when torch>=2.0.1 and torch.distributed is disabled by @natsukium in #2121
Make safetensors the default by @muellerzr in #2120
Raise error when saving with param on meta device by @SunMarc in #2132
Leave native save as False by @muellerzr in #2138
fix retie_parameters by @SunMarc in #2137
Deal with shared memory scenarios by @muellerzr in #2136
specify config file path on README by @kwonmha in #2140
Fix safetensors contiguous by @SunMarc in #2145
Fix more tests by @muellerzr in #2146
[docs] fixed a couple of broken links by @MKhalusova in #2147
[docs] troubleshooting guide by @MKhalusova in #2133
[Docs] fix doc typos by @kashif in #2150
Add note about GradientState being in-sync with the dataloader by default by @muellerzr in #2134
Deprecated runner stuff by @muellerzr in #2152
Add examples to tests by @muellerzr in #2131
Disable pypi for merge workflows + fix trainer tests by @muellerzr in #2153
Adds dvclive tracker by @dberenbaum in #2139
check port availability only in main deepspeed/torchrun launcher by @Jingru in #2078
Do not attempt to pad nested tensors by @frankier in #2041
Add warning for problematic libraries by @muellerzr in #2151
Add ZeRO++ to DeepSpeed usage docs by @SumanthRH in #2166
Fix Megatron-LM Arguments Bug by @yuanenming in #2168
Fix non persistant buffer dispatch by @SunMarc in #1941
Updated torchrun instructions by @TJ-Solergibert in #2096
New CI Runners by @muellerzr in #2087
Revert "New CI Runners" by @muellerzr in #2172
[Working again] New CI by @muellerzr in #2173
fsdp refactoring by @pacman100 in #2177
Pin DVC by @muellerzr in #2196
Apply DVC warning to Accelerate by @muellerzr in #2197
Explicitly disable P2P using launch, and pick up in state if a user will face issues. by @muellerzr in #2195
Better error when device mismatches when calling gather() on CUDA by @muellerzr in #2180
unpins dvc by @dberenbaum in #2200
Assemble state dictionary for offloaded models by @blbadger in #2156
Allow deepspeed without distributed launcher by @pacman100 in #2204

New Contributors

@eugen-ajechiloae-clearml made their first contribution in #2034
@kuza55 made their first contribution in #2090
@Fluder-Paradyne made their first contribution in #2099
@YuryYakhno made their first contribution in #2115
@jq460494839 made their first contribution in #2113
@kwonmha made their first contribution in #2140
@dberenbaum made their first contribution in #2139
@Jingru made their first contribution in #2078
@frankier made their first contribution in #2041
@yuanenming made their first contribution in #2168
@TJ-Solergibert made their first contribution in #2096
@blbadger made their first contribution in #2156

Full Changelog: v0.24.1...v0.25.0

accelerate 0.25.0 v0.25.0: safetensors by default, new trackers, and plenty of bug fixes on Python PyPI