Gradient Accumulation
Accelerate now handles gradient accumulation if you want, just pass along gradient_accumulation_steps=xxx
when instantiating the Accelerator
and put all your training loop step under a with accelerator.accumulate(model):
. Accelerate will then handle the loss re-scaling and gradient accumulation for you (avoiding slowdowns in distributed training when gradients only need to be synced when you want to step). More details in the documentation.
- Add gradient accumulation doc by @muellerzr in #511
- Make gradient accumulation work with dispatched dataloaders by @muellerzr in #510
- Introduce automatic gradient accumulation wrapper + fix a few test issues by @muellerzr in #484
Support for SageMaker Data parallelism
Accelerate now support SageMaker specific brand of data parallelism.
- SageMaker enhancements to allow custom docker image, input channels referring to s3/remote data locations and metrics logging by @pacman100 in #504
- SageMaker DP Support by @pacman100 in #494
What's new?
- Fix accelerate tests command by @sgugger in #528
- FSDP integration enhancements and fixes by @pacman100 in #522
- Warn user if no trackers are installed by @muellerzr in #524
- Fixup all example CI tests and properly fail by @muellerzr in #517
- fixing deepspeed multi-node launcher by @pacman100 in #514
- Add special Parameters modules support by @younesbelkada in #519
- Don't unwrap in save_state() by @cccntu in #489
- Fix a bug when reduce a tensor. by @wwhio in #513
- Add benchmarks by @sgugger in #506
- Fix DispatchDataLoader length when
split_batches=True
by @sgugger in #509 - Fix scheduler in gradient accumulation example by @muellerzr in #500
- update dataloader wrappers to have
total_batch_size
attribute by @pacman100 in #493 - Introduce automatic gradient accumulation wrapper + fix a few test issues by @muellerzr in #484
- add use_distributed property by @ZhiyuanChen in #487
- fixing fsdp autowrap functionality by @pacman100 in #475
- Use datasets 2.2.0 for now by @muellerzr in #481
- Rm gradient accumulation on TPU by @muellerzr in #479
- Revert "Pin datasets for now by @muellerzr in #477)"
- Pin datasets for now by @muellerzr in #477
- Some typos and cosmetic fixes by @douwekiela in #472
- Fix when TPU device check is ran by @muellerzr in #469
- Refactor Utility Documentation by @muellerzr in #467
- Add docbuilder to quality by @muellerzr in #468
- Expose some is_*_available utils in docs by @muellerzr in #466
- Cleanup CI Warnings by @muellerzr in #465
- Link CI slow runners to the commit by @muellerzr in #464
- Fix subtle bug in BF16 by @muellerzr in #463
- Include bf16 support for TPUs and CPUs, and a better check for if a CUDA device supports BF16 by @muellerzr in #462
- Handle bfloat16 weights in disk offload without adding memory overhead by @noamwies in #460)
- Handle bfloat16 weights in disk offload by @sgugger in #460
- Raise a clear warning if a user tries to modify the AcceleratorState by @muellerzr in #458
- Right step point by @muellerzr in #459
- Better checks for if a TPU device exists by @muellerzr in #456
- Offload and modules with unused submodules by @sgugger in #442