accelerate 0.30.0 on Python PyPI

Core

We've simplified the tqdm wrapper to make it fully passthrough, no need to have tqdm(main_process_only, *args), it is now just tqdm(*args) and you can pass in is_main_process as a kwarg.
We've added support for advanced optimizer usage:
- Schedule free optimizer introduced by Meta by @muellerzr in #2631
- LOMO optimizer introduced by OpenLMLab by @younesbelkada in #2695
Enable BF16 autocast to everything during FP8 and enable FSDP by @muellerzr in #2655
Support dataloader send_to_device calls to use non-blocking by @drhead in #2685
allow gather_for_metrics to be more flexible by @SunMarc in #2710
Add cann version info to command accelerate env for NPU by @statelesshz in #2689
Add MLU rng state setter by @ArthurinRUC in #2664
device agnostic testing for hooks&utils&big_modeling by @statelesshz in #2602

Documentation

Through collaboration between @fabianlim (lead contribuitor), @stas00, @pacman100, and @muellerzr we have a new concept guide out for FSDP and DeepSpeed explicitly detailing how each interop and explaining fully and clearly how each of those work. This was a momumental effort by @fabianlim to ensure that everything can be as accurate as possible to users. I highly recommend visiting this new documentation, available here
New distributed inference examples have been added thanks to @SunMarc in #2672
Fixed some docs for using internal trackers by @brentyi in #2650

DeepSpeed

Accelerate can now handle MoE models when using deepspeed, thanks to @pacman100 in #2662
Allow "auto" for gradient clipping in YAML by @regisss in #2649
Introduce a deepspeed-specific Docker image by @muellerzr in #2707. To use, pull the gpu-deepspeed tag docker pull huggingface/accelerate:cuda-deepspeed-nightly

Megatron

Megatron plugin can support NPU by @zhangsheng377 in #2667

Big Modeling

Add strict arg to load_checkpoint_and_dispatch by @SunMarc in #2641

Bug Fixes

Fix up state with xla + performance regression by @muellerzr in #2634
Parenthesis on xpu_available by @muellerzr in #2639
Fix is_train_batch_min type in DeepSpeedPlugin by @yhna940 in #2646
Fix backend check by @jiqing-feng in #2652
Fix the rng states of sampler's generator to be synchronized for correct sharding of dataset across GPUs by @pacman100 in #2694
Block AMP for MPS device by @SunMarc in #2699
Fixed issue when doing multi-gpu training with bnb when the first gpu is not used by @SunMarc in #2714
Fixup free_memory to deal with garbage collection by @muellerzr in #2716
Fix sampler serialization failing by @SunMarc in #2723
Fix deepspeed offload device type in the arguments to be more accurate by @yhna940 in #2717

Full Changelog

Schedule free optimizer support by @muellerzr in #2631
Fix up state with xla + performance regression by @muellerzr in #2634
Parenthesis on xpu_available by @muellerzr in #2639
add third-party device prefix to execution_device by @faaany in #2612
add strict arg to load_checkpoint_and_dispatch by @SunMarc in #2641
device agnostic testing for hooks&utils&big_modeling by @statelesshz in #2602
Docs fix for using internal trackers by @brentyi in #2650
Allow "auto" for gradient clipping in YAML by @regisss in #2649
Fix is_train_batch_min type in DeepSpeedPlugin by @yhna940 in #2646
Don't use deprecated Repository anymore by @Wauplin in #2658
Fix test_from_pretrained_low_cpu_mem_usage_measured failure by @yuanwu2017 in #2644
Add MLU rng state setter by @ArthurinRUC in #2664
fix backend check by @jiqing-feng in #2652
Megatron plugin can support NPU by @zhangsheng377 in #2667
Revert "fix backend check" by @muellerzr in #2669
tqdm: *args should come ahead of main_process_only by @rb-synth in #2654
Handle MoE models with DeepSpeed by @pacman100 in #2662
Fix deepspeed moe test with version check by @pacman100 in #2677
Pin DS...again.. by @muellerzr in #2679
fix backend check by @jiqing-feng in #2670
Deprecate tqdm args + slight logic tweaks by @muellerzr in #2673
Enable BF16 autocast to everything during FP8 + some tweaks to enable FSDP by @muellerzr in #2655
Fix the rng states of sampler's generator to be synchronized for correct sharding of dataset across GPUs by @pacman100 in #2694
Simplify test logic by @pacman100 in #2697
Add source code for DataLoader Animation by @muellerzr in #2696
Block AMP for MPS device by @SunMarc in #2699
Do a pip freeze during workflows by @muellerzr in #2704
add cann version info to command accelerate env by @statelesshz in #2689
Add version checks for the import of DeepSpeed moe utils by @pacman100 in #2705
Change dataloader send_to_device calls to non-blocking by @drhead in #2685
add distributed examples by @SunMarc in #2672
Add diffusers to req by @muellerzr in #2711
fix bnb multi gpu training by @SunMarc in #2714
allow gather_for_metrics to be more flexible by @SunMarc in #2710
Add Upcasting for FSDP in Mixed Precision. Add Concept Guide for FSPD and DeepSpeed. by @fabianlim in #2674
Segment out a deepspeed docker image by @muellerzr in #2707
Fixup free_memory to deal with garbage collection by @muellerzr in #2716
fix sampler serialization by @SunMarc in #2723
Fix sampler failing test by @SunMarc in #2728
Docs: Fix build main documentation by @SunMarc in #2729
Fix Documentation in FSDP and DeepSpeed Concept Guide by @fabianlim in #2725
Fix deepspeed offload device type by @yhna940 in #2717
FEAT: Add LOMO optimizer by @younesbelkada in #2695
Fix tests on main by @muellerzr in #2739

New Contributors

@brentyi made their first contribution in #2650
@regisss made their first contribution in #2649
@yhna940 made their first contribution in #2646
@Wauplin made their first contribution in #2658
@ArthurinRUC made their first contribution in #2664
@jiqing-feng made their first contribution in #2652
@zhangsheng377 made their first contribution in #2667
@rb-synth made their first contribution in #2654
@drhead made their first contribution in #2685

Full Changelog: v0.29.3...v0.30.0

accelerate 0.30.0 v0.30.0: Advanced optimizer support, MoE DeepSpeed support, add upcasting for FSDP, and more on Python PyPI

Core

Documentation

DeepSpeed

Megatron

Big Modeling

Bug Fixes

Full Changelog

New Contributors

accelerate 0.30.0
v0.30.0: Advanced optimizer support, MoE DeepSpeed support, add upcasting for FSDP, and more

on Python PyPI