What's Changed
- A new
GradientAccumulationPlugin
has been added to handle more configurations with theGradientState
. Specifically you can optionally disable havingAccelerate
automatically adjust the length of the scheduler relative to gradient accumulation steps through it. Otherwise Accelerate will now automatically handle ensuring that the schedulers built for non-gradient accumulation will work during gradient accumulation - Some fixes related to the launch configuration and TPU launches were adjusted, and the
dynamo_backend
warning has been silenced. - Big model inference saw a number of fixes related to linear layers,
drop_last
on linear layers, tied weight loading, and handling of multiple tied parameters - A new integration example with RunhouseML has been added, read more here: https://github.com/huggingface/accelerate/tree/main/examples#simple-multi-gpu-hardware-launcher
Breaking Changes
find_tied_parameters
now deals with groups of tied parameters (instead of only pairs of them). As a result it now returns a list of list of strings instead of a dictionary.
What's New?
- Add documentation around FSDP state dict save behavior by @VikParuchuri in #1181
- add
use_orig_params
to FullyShardedDataParallelPlugin by @pacman100 in #1184 - Only convert linear layers with weights multiple of 16 by @sgugger in #1188
- Set drop last to ensure modulo16 restriction for fp8 by @ksivaman in #1189
- Accelerator should not call
to
on modules that wrapsaccelerate
loaded models by @younesbelkada in #1172 - Fixup passing overlapping args to the script by @muellerzr in #1198
- Make the Scheduler adjust the steps taken relative to the gradient accumulation steps by @muellerzr in #1187
- Fix tied weights load by @sgugger in #1204
- Better error message when using multi-GPU and Accelerate on torch <1.9.1 by @muellerzr in #1203
- Fix typo in TPU config by @muellerzr in #1202
- Fix example in accumulate method documentation by @VikParuchuri in #1211
- ds offload optim fix to use CPUAdam by @pacman100 in #1208
- Move when the GradientState test is performed so that it is not None by @muellerzr in #1219
- Fix bug in loading launch config by @neumyor in #1218
- Fix get_logger kwarg documentation issue by @bcol23 in #1222
- docs: add finetuner to ppl who use accelerate by @bwanglzu in #1224
- Silence dynamo_backend by @muellerzr in #1226
- Add additional check when patching env by @Chris-hughes10 in #1229
- Make grad accum steps mutable on the Accelerator object by @muellerzr in #1233
- devcontainer: "extensions" has been removed and replaced by customizations by @dbpprt in #1075
- remove empty dicts while saving accelerate config by @pacman100 in #1236
- backfill ds plugin attributes when using ds_config by @pacman100 in #1235
- Change multinode to multigpu in notebook tutorial by @muellerzr in #1247
- Hardware Auto-Setup Example/Tutorial for Distributed Launch by @carolineechen in #1227
- Handle multiple tied parameters by @sgugger in #1241
New Contributors
- @hackpert made their first contribution in #1180
- @VikParuchuri made their first contribution in #1181
- @ksivaman made their first contribution in #1189
- @neumyor made their first contribution in #1218
- @bcol23 made their first contribution in #1222
- @bwanglzu made their first contribution in #1224
- @carolineechen made their first contribution in #1227
Full Changelog: v0.17.1...v0.18.0