accelerate 0.26.0 on Python PyPI

Support for MS-AMP

This release adds support for the MS-AMP (Microsoft Automatic Mixed Precision Library) into Accelerate as an alternative backend for doing FP8 training on appropriate hardware. It is the default backend of choice. Read more in the docs here. Introduced in #2232 by @muellerzr

Core

In the prior release a new sampler for the DataLoader was introduced that while across seeds does not show statistical differences in the results, repeating the same seed would result in a different end-accuracy that was scary to some users. We have now disabled this behavior by default as it required some additional setup, and brought back the original implementation. To have the new sampling technique (which can provide more accurate repeated results) pass use_seedable_sampler=True to the Accelerator. We will be propagating this up to the Trainer soon.

Big Model Inference

NPU support was added thanks to @statelesshz in #2222
When generating an automatic device_map we've made it possible to not returned grouped key results if desired in #2233
We now handle corner cases better when users pass device_map="cuda" etc thanks to @younesbelkada in #2254

FSDP and DeepSpeed

Many improvements to the docs have been made thanks to @stass. Along with this we've made it easier to adjust the config for the sharding strategy and other config values thanks to @pacman100 in #2288
A regression in Accelerate 0.23.0 occurred that showed learning is much slower on multi-GPU setups compared to a single GPU. #2304 has now fixed this thanks to @pacman100
The DeepSpeed integration now also handles auto values better when making a configuration in #2313

Bits and Bytes

Params4bit added to bnb classes in set_module_tensor_to_device() by @poedator in #2315

Device Agnostic Testing

For developers, we've made it much easier to run the tests on different devices with no change to the code thanks to @statelesshz in #2123 and #2235

Bug Fixes

Check notebook launcher for 3090+ by @muellerzr in #2212
Fix dtype bug when offload_state_dict=True and dtype is specified by @fxmarty in #2116
fix tqdm wrapper to print when process id ==0 by @kashif in #2223
fix BFloat16 is not supported on MPS (#2226) by @jxysoft in #2227
Fix MpDeviceLoaderWrapper not having attribute batch_sampler by @vanbasten23 in #2242
[deepspeed] fix setting auto values for comm buffers by @stas00 in #2295
Fix infer_auto_device_map when tied weights share the same prefix name by @fxmarty in #2324
Fixes bug in swapping weights when replacing with Transformer-Engine layers by @sudhakarsingh27 in #2305
Fix breakpoint API in test_script.py on TPU. by @vanbasten23 in #2263
Bring old seed technique back by @muellerzr in #2319

Major Contributors

@statelesshz for their work on device-agnostic testing and NPU support
@stas00 for many docfixes when it comes to DeepSpeed and FSDP

General Changelog

add missing whitespace by @stas00 in #2206
MNT Delete the delete doc workflows by @BenjaminBossan in #2217
Update docker images by @muellerzr in #2213
Add allgather check for xpu by @abhilash1910 in #2199
Check notebook launcher for 3090+ by @muellerzr in #2212
Fix dtype bug when offload_state_dict=True and dtype is specified by @fxmarty in #2116
fix tqdm wrapper to print when process id ==0 by @kashif in #2223
[data_loader] expand the error message by @stas00 in #2221
Update the 'Frameworks using Accelerate' section to include Amphion by @RMSnow in #2225
[Docs] Add doc for cpu/disk offload by @SunMarc in #2231
device agnostic testing by @statelesshz in #2123
Make cleaning optional for device map by @muellerzr in #2233
Add npu support to big model inference by @statelesshz in #2222
fix the DS failing test by @pacman100 in #2237
Fix nb tests by @muellerzr in #2230
fix BFloat16 is not supported on MPS (#2226) by @jxysoft in #2227
Fix MpDeviceLoaderWrapper not having attribute batch_sampler by @vanbasten23 in #2242
[Big-Modeling] Harmonize device check to handle corner cases by @younesbelkada in #2254
Support log_images for aim tracker by @Justin900429 in #2257
Integrate MS-AMP Support for FP8 as a seperate backend by @muellerzr in #2232
refactor deepspeed dataloader prepare logic by @pacman100 in #2238
device agnostic deepspeed&fsdp testing by @statelesshz in #2235
Solve CUDA issues by @muellerzr in #2272
Uninstall DVC in the Trainer tests by @muellerzr in #2271
Rm DVCLive from test reqs as latest version causes failures by @muellerzr in #2279
typo fix by @stas00 in #2276
Add condition before using check_tied_parameters_on_same_device by @SunMarc in #2218
[doc] FSDP improvements by @stas00 in #2274
[deepspeed docs] auto-values aren't being covered by @stas00 in #2286
Improve FSDP config usability by @pacman100 in #2288
[doc] language fixes by @stas00 in #2292
Bump tj-actions/changed-files from 22.2 to 41 in /.github/workflows by @dependabot in #2300
add back dvclive to tests by @dberenbaum in #2280
Fixes bug in swapping weights when replacing with Transformer-Engine layers by @sudhakarsingh27 in #2305
Fix breakpoint API in test_script.py on TPU. by @vanbasten23 in #2263
make test_state_checkpointing device agnostic by @statelesshz in #2290
[deepspeed] documentation by @stas00 in #2296
Add more missing items by @muellerzr in #2309
Update docs: Add warning for device_map=None for load_checkpoint_and_dispatch by @PhilJd in #2308
[deepspeed] fix setting auto values for comm buffers by @stas00 in #2295
DeepSpeed refactoring by @pacman100 in #2313
Fix DeepSpeed related regression by @pacman100 in #2304
Update test_deepspeed.py by @pacman100 in #2323
Bring old seed technique back by @muellerzr in #2319
Fix batch_size sanity check in prepare_data_loader by @izhx in #2310
Params4bit added to bnb classes in set_module_tensor_to_device() by @poedator in #2315
Fix infer_auto_device_map when tied weights share the same prefix name by @fxmarty in #2324

New Contributors

@fxmarty made their first contribution in #2116
@RMSnow made their first contribution in #2225
@jxysoft made their first contribution in #2227
@vanbasten23 made their first contribution in #2242
@Justin900429 made their first contribution in #2257
@dependabot made their first contribution in #2300
@sudhakarsingh27 made their first contribution in #2305
@PhilJd made their first contribution in #2308
@izhx made their first contribution in #2310
@poedator made their first contribution in #2315

Full Changelog: v0.25.0...v0.26.0

accelerate 0.26.0 v0.26.0 - MS-AMP Support, Critical Regression Fixes, and More on Python PyPI