Safetensors default
As of this release, safetensors
will be the default format saved when applicable! To read more about safetensors and why it's best to use it for safety (and not pickle/torch.save), check it out here
New Experiment Trackers
This release has two new experiment trackers, ClearML and DVCLive!
To use them, just pass clear_ml
or dvclive
to log_with
in the Accelerator
init. h/t to @eugen-ajechiloae-clearml and @dberenbaum
DeepSpeed
- Accelerate's DeepSpeed integration now supports NPU devices, h/t to @statelesshz
- DeepSpeed can now be launched via accelerate on single GPU setups
FSDP
FSDP had a huge refactoring so that the interface when using FSDP is the exact same as every other scenario when using accelerate
. No more needing to call accelerator.prepare()
twice!
Other useful enhancements
-
We now raise and try to disable P2P communications on consumer GPUs for the 3090 series and beyond. Without this users were seeing timeout issues and the like as NVIDIA dropped P2P support. If using
accelerate launch
we will automatically disable, and if we sense that it is still enabled on distributed setups using 3090's +, we will raise an error. -
When doing
.gather()
, if tensors are on different devices we explicitly will raise an error (for now only valid on CUDA)
Bug fixes
- Fixed a bug that caused dataloaders to not shuffle despite
shuffle=True
when using multiple GPUs and the newSeedableRandomSampler
.
General Changelog
- Add logs offloading by @SunMarc in #2075
- Add ClearML tracker by @eugen-ajechiloae-clearml in #2034
- CRITICAL: fix failing ci by @muellerzr in #2088
- Fix flag typo by @kuza55 in #2090
- Fix batch sampler by @muellerzr in #2097
- fixed ip address typo by @Fluder-Paradyne in #2099
- Fix memory leak in fp8 causing OOM (and potentially 3x vRAM usage) by @muellerzr in #2089
- fix warning when offload by @SunMarc in #2105
- Always use SeedableRandomSampler by @muellerzr in #2110
- Fix issue with tests by @muellerzr in #2111
- Make SeedableRandomSampler the default always by @muellerzr in #2117
- Use "and" instead of comma in Bibtex citation by @qgallouedec in #2119
- Add explicit error if empty batch received by @YuryYakhno in #2115
- Allow for ACCELERATE_SEED env var by @muellerzr in #2126
- add DeepSpeed support for NPU by @statelesshz in #2054
- Sync states for npu fsdp by @jq460494839 in #2113
- Fix import error when torch>=2.0.1 and torch.distributed is disabled by @natsukium in #2121
- Make safetensors the default by @muellerzr in #2120
- Raise error when saving with param on meta device by @SunMarc in #2132
- Leave native
save
asFalse
by @muellerzr in #2138 - fix retie_parameters by @SunMarc in #2137
- Deal with shared memory scenarios by @muellerzr in #2136
- specify config file path on README by @kwonmha in #2140
- Fix safetensors contiguous by @SunMarc in #2145
- Fix more tests by @muellerzr in #2146
- [docs] fixed a couple of broken links by @MKhalusova in #2147
- [docs] troubleshooting guide by @MKhalusova in #2133
- [Docs] fix doc typos by @kashif in #2150
- Add note about GradientState being in-sync with the dataloader by default by @muellerzr in #2134
- Deprecated runner stuff by @muellerzr in #2152
- Add examples to tests by @muellerzr in #2131
- Disable pypi for merge workflows + fix trainer tests by @muellerzr in #2153
- Adds dvclive tracker by @dberenbaum in #2139
- check port availability only in main deepspeed/torchrun launcher by @Jingru in #2078
- Do not attempt to pad nested tensors by @frankier in #2041
- Add warning for problematic libraries by @muellerzr in #2151
- Add ZeRO++ to DeepSpeed usage docs by @SumanthRH in #2166
- Fix Megatron-LM Arguments Bug by @yuanenming in #2168
- Fix non persistant buffer dispatch by @SunMarc in #1941
- Updated torchrun instructions by @TJ-Solergibert in #2096
- New CI Runners by @muellerzr in #2087
- Revert "New CI Runners" by @muellerzr in #2172
- [Working again] New CI by @muellerzr in #2173
- fsdp refactoring by @pacman100 in #2177
- Pin DVC by @muellerzr in #2196
- Apply DVC warning to Accelerate by @muellerzr in #2197
- Explicitly disable P2P using
launch
, and pick up instate
if a user will face issues. by @muellerzr in #2195 - Better error when device mismatches when calling gather() on CUDA by @muellerzr in #2180
- unpins dvc by @dberenbaum in #2200
- Assemble state dictionary for offloaded models by @blbadger in #2156
- Allow deepspeed without distributed launcher by @pacman100 in #2204
New Contributors
- @eugen-ajechiloae-clearml made their first contribution in #2034
- @kuza55 made their first contribution in #2090
- @Fluder-Paradyne made their first contribution in #2099
- @YuryYakhno made their first contribution in #2115
- @jq460494839 made their first contribution in #2113
- @kwonmha made their first contribution in #2140
- @dberenbaum made their first contribution in #2139
- @Jingru made their first contribution in #2078
- @frankier made their first contribution in #2041
- @yuanenming made their first contribution in #2168
- @TJ-Solergibert made their first contribution in #2096
- @blbadger made their first contribution in #2156
Full Changelog: v0.24.1...v0.25.0