Support for MS-AMP
This release adds support for the MS-AMP (Microsoft Automatic Mixed Precision Library) into Accelerate as an alternative backend for doing FP8 training on appropriate hardware. It is the default backend of choice. Read more in the docs here. Introduced in #2232 by @muellerzr
Core
In the prior release a new sampler for the DataLoader
was introduced that while across seeds does not show statistical differences in the results, repeating the same seed would result in a different end-accuracy that was scary to some users. We have now disabled this behavior by default as it required some additional setup, and brought back the original implementation. To have the new sampling technique (which can provide more accurate repeated results) pass use_seedable_sampler=True
to the Accelerator
. We will be propagating this up to the Trainer
soon.
Big Model Inference
- NPU support was added thanks to @statelesshz in #2222
- When generating an automatic
device_map
we've made it possible to not returned grouped key results if desired in #2233 - We now handle corner cases better when users pass
device_map="cuda"
etc thanks to @younesbelkada in #2254
FSDP and DeepSpeed
-
Many improvements to the docs have been made thanks to @stass. Along with this we've made it easier to adjust the config for the sharding strategy and other config values thanks to @pacman100 in #2288
-
A regression in Accelerate 0.23.0 occurred that showed learning is much slower on multi-GPU setups compared to a single GPU. #2304 has now fixed this thanks to @pacman100
-
The DeepSpeed integration now also handles
auto
values better when making a configuration in #2313
Bits and Bytes
Device Agnostic Testing
For developers, we've made it much easier to run the tests on different devices with no change to the code thanks to @statelesshz in #2123 and #2235
Bug Fixes
- Check notebook launcher for 3090+ by @muellerzr in #2212
- Fix dtype bug when
offload_state_dict=True
anddtype
is specified by @fxmarty in #2116 - fix tqdm wrapper to print when process id ==0 by @kashif in #2223
- fix BFloat16 is not supported on MPS (#2226) by @jxysoft in #2227
- Fix MpDeviceLoaderWrapper not having attribute batch_sampler by @vanbasten23 in #2242
- [deepspeed] fix setting
auto
values for comm buffers by @stas00 in #2295 - Fix infer_auto_device_map when tied weights share the same prefix name by @fxmarty in #2324
- Fixes bug in swapping weights when replacing with Transformer-Engine layers by @sudhakarsingh27 in #2305
- Fix breakpoint API in test_script.py on TPU. by @vanbasten23 in #2263
- Bring old seed technique back by @muellerzr in #2319
Major Contributors
- @statelesshz for their work on device-agnostic testing and NPU support
- @stas00 for many docfixes when it comes to DeepSpeed and FSDP
General Changelog
- add missing whitespace by @stas00 in #2206
- MNT Delete the delete doc workflows by @BenjaminBossan in #2217
- Update docker images by @muellerzr in #2213
- Add allgather check for xpu by @abhilash1910 in #2199
- Check notebook launcher for 3090+ by @muellerzr in #2212
- Fix dtype bug when
offload_state_dict=True
anddtype
is specified by @fxmarty in #2116 - fix tqdm wrapper to print when process id ==0 by @kashif in #2223
- [data_loader] expand the error message by @stas00 in #2221
- Update the 'Frameworks using Accelerate' section to include Amphion by @RMSnow in #2225
- [Docs] Add doc for cpu/disk offload by @SunMarc in #2231
- device agnostic testing by @statelesshz in #2123
- Make cleaning optional for device map by @muellerzr in #2233
- Add npu support to big model inference by @statelesshz in #2222
- fix the DS failing test by @pacman100 in #2237
- Fix nb tests by @muellerzr in #2230
- fix BFloat16 is not supported on MPS (#2226) by @jxysoft in #2227
- Fix MpDeviceLoaderWrapper not having attribute batch_sampler by @vanbasten23 in #2242
- [
Big-Modeling
] Harmonize device check to handle corner cases by @younesbelkada in #2254 - Support
log_images
for aim tracker by @Justin900429 in #2257 - Integrate MS-AMP Support for FP8 as a seperate backend by @muellerzr in #2232
- refactor deepspeed dataloader prepare logic by @pacman100 in #2238
- device agnostic deepspeed&fsdp testing by @statelesshz in #2235
- Solve CUDA issues by @muellerzr in #2272
- Uninstall DVC in the Trainer tests by @muellerzr in #2271
- Rm DVCLive from test reqs as latest version causes failures by @muellerzr in #2279
- typo fix by @stas00 in #2276
- Add condition before using
check_tied_parameters_on_same_device
by @SunMarc in #2218 - [doc] FSDP improvements by @stas00 in #2274
- [deepspeed docs] auto-values aren't being covered by @stas00 in #2286
- Improve FSDP config usability by @pacman100 in #2288
- [doc] language fixes by @stas00 in #2292
- Bump tj-actions/changed-files from 22.2 to 41 in /.github/workflows by @dependabot in #2300
- add back dvclive to tests by @dberenbaum in #2280
- Fixes bug in swapping weights when replacing with Transformer-Engine layers by @sudhakarsingh27 in #2305
- Fix breakpoint API in test_script.py on TPU. by @vanbasten23 in #2263
- make test_state_checkpointing device agnostic by @statelesshz in #2290
- [deepspeed] documentation by @stas00 in #2296
- Add more missing items by @muellerzr in #2309
- Update docs: Add warning for device_map=None for load_checkpoint_and_dispatch by @PhilJd in #2308
- [deepspeed] fix setting
auto
values for comm buffers by @stas00 in #2295 - DeepSpeed refactoring by @pacman100 in #2313
- Fix DeepSpeed related regression by @pacman100 in #2304
- Update test_deepspeed.py by @pacman100 in #2323
- Bring old seed technique back by @muellerzr in #2319
- Fix batch_size sanity check in
prepare_data_loader
by @izhx in #2310 Params4bit
added to bnb classes in set_module_tensor_to_device() by @poedator in #2315- Fix infer_auto_device_map when tied weights share the same prefix name by @fxmarty in #2324
New Contributors
- @fxmarty made their first contribution in #2116
- @RMSnow made their first contribution in #2225
- @jxysoft made their first contribution in #2227
- @vanbasten23 made their first contribution in #2242
- @Justin900429 made their first contribution in #2257
- @dependabot made their first contribution in #2300
- @sudhakarsingh27 made their first contribution in #2305
- @PhilJd made their first contribution in #2308
- @izhx made their first contribution in #2310
- @poedator made their first contribution in #2315
Full Changelog: v0.25.0...v0.26.0