accelerate 0.17.0 on Python PyPI

PyTorch 2.0 support

This release fully supports the upcoming PyTorch 2.0 release. You can choose to use torch.compile or not and then customize the options in accelerate.config or via a TorchDynamoPlugin.

update support for torch dynamo compile by @pacman100 in #1150

Process Control Enhancements

This release adds a new PartialState, which contains most of the capabilities of the AcceleratorState however it is designed to be used by the user to assist in any process control mechanisms around it. With this, users also now do not need to have if accelerator.state.is_main_process when utilizing classes such as the Tracking API, as these now will automatically use only the main process for their work by default.

Refactor process executors to be in AcceleratorState by @muellerzr in #1039

TPU Pod Support (Experimental)

Launching from TPU pods is now supported, please see this issue for more information

Introduce TPU Pod launching to accelerate launch by @muellerzr in #1049

FP8 mixed precision training (Experimental)

This release adds experimental support for FP8 mixed precision training, which requires the transformer-engine library as well as a Hopper GPU (or higher).

Fp8 integration by @sgugger in #1086

What's new?

v0.17.0.dev0 by @sgugger (direct commit on main)
Deepspeed param check by @dhar174 in #1015
enabling mps device by default and removing related config by @pacman100 in #1030
fix: links to gradient synchronization by @prassanna-ravishankar in #1035
do not scale gradient in bf16 mode by @kashif in #1036
Pass keywords arguments of backward function deeper to DeepSpeed by @DistinctVision in #1037
Add daily slack notifier for nightlies by @muellerzr in #1042
Make sure direct parameters are properly set on device by @sgugger in #1043
Add cpu_offload_with_hook by @sgugger in #1045
Update quality tools to 2023 by @sgugger in #1046
Load tensors directly on device by @sgugger in #1028
Fix cpu_offload_with_hook code snippet by @pcuenca in #1047
Use create_task by @muellerzr in #1052
Fix args by adding in the defaults by @muellerzr in #1053
deepspeed hidden_size auto value default fixes by @pacman100 in #1060
Introduce PartialState by @muellerzr in #1055
Flag for deprecation by @muellerzr in #1061
Try with this by @muellerzr in #1062
Update integrations by @muellerzr in #1063
Swap utils over to use PartialState by @muellerzr in #1065
update fsdp docs and removing deepspeed version pinning by @pacman100 in #1059
Fix/implement process-execution decorators on the Accelerator by @muellerzr in #1070
Refactor state and make PartialState first class citizen by @muellerzr in #1071
Add error if passed --config_file does not exist by @muellerzr in #1074
SageMaker image_uri is now optional by @ in #1077
Allow custom SageMaker Estimator arguments by @ in #1080
Fix tpu_cluster arg by @muellerzr in #1081
Update complete_cv_example.py by @fcossio in #1082
Added SageMaker local mode config section by @ in #1084
Fix config by @muellerzr in #1090
adds missing "lfs" in pull by @CSchoel in #1091
add multi_cpu support to reduce by @alex-hh in #1094
Update README.md by @BM-K in #1100
Tracker rewrite and lazy process checker by @muellerzr in #1079
Update performance.mdx by @fcossio in #1107
Attempt to unwrap tracker. by @pcuenca in #1109
TensorBoardTracker: wrong arg def by @stas00 in #1111
Actually raise if exception by @muellerzr in #1124
Add test for ops and fix reduce by @muellerzr in #1122
Deep merge SageMaker additional_args, allowing more flexible configuration and env variable support by @dbpprt in #1113
Move dynamo.optimize to the end of model preparation by @ymwangg in #1128
Refactor launch for greater extensibility by @Yard1 in #1123
[Big model loading] Correct GPU only loading by @patrickvonplaten in #1121
Add tee and role to launch by @muellerzr in #1132
Expand warning and grab all GPUs available by default by @muellerzr in #1134
Fix multinode with GPU ids when each node has 1 by @muellerzr in #1127
deepspeed dataloader prepare fix by @pacman100 in #1126
fix ds dist init kwargs issue by @pacman100 in #1138
fix lr scheduler issue by @pacman100 in #1140
fsdp bf16 enable autocast by @pacman100 in #1125
Fix notebook_launcher by @muellerzr in #1141
fix partial state by @pacman100 in #1144
FSDP enhancements and fixes by @pacman100 in #1145
Fixed typos in notebook by @SamuelLarkin in #1146
Include a note in the gradient synchronization docs on "what can go wrong" and show the timings by @muellerzr in #1153
[Safetensors] Relax missing metadata constraint by @patrickvonplaten in #1151
Solve arrow keys being environment dependant for accelerate config by @p1atdev (direct commit on main)
Load custom state to cpu by @Guangxuan-Xiao in #1156
📝 add a couple more trackers to the docs by @nateraw in #1158
Let GradientState know active dataloaders and reset the remainder by @muellerzr in #1162
Attempt to fix import error when PyTorch is build without torch.distributed module by @mfuntowicz in #1108
[Accelerator] Fix issue with 8bit models by @younesbelkada in #1155
Document skip_first_batches in the checkpoint usage guides by @muellerzr in #1164
Fix what files get deleted through total_limit by @muellerzr in #1165
Remove outdated command directions and use in tests by @muellerzr in #1166

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@Yard1
- Refactor launch for greater extensibility (#1123)

accelerate 0.17.0 v0.17.0: PyTorch 2.0 support, Process Control Enhancements, TPU pod support and FP8 mixed precision training on Python PyPI