accelerate 0.12.0 on Python PyPI

New documentation

The whole documentation has been revamped, just go look at it here!

Complete revamp of the docs by @muellerzr in #495

New gather_for_metrics method

When doing distributed evaluation, the dataloader loops back at the beginning of the dataset to make batches that have a round multiple of the number of processes. This causes the predictions to be slightly bigger than the length of the dataset, which used to require some truncating. This is all done behind the scenes now if you replace the gather your did in evaluation by gather_for_metrics.

Reenable Gather for Metrics by @muellerzr in #590
Fix gather_for_metrics by @muellerzr in #578
Add a gather_for_metrics capability by @muellerzr in #540

Balanced device maps

When loading big models for inference, device_map="auto" used to fill the GPUs sequentially, making it hard to use a batch size > 1. It now balances the weights evenly on the GPUs so if you have more GPU space than the model size, you can do predictions with a bigger batch size!

M1 GPU support

Accelerate now supports M1 GPUs, to learn more about how to setup your environment, see the documentation.

M1 GPU mps device integration by @pacman100 in #596

What's new?

Small fixed for balanced device maps by @sgugger in #583
Add balanced option for auto device map creation by @sgugger in #534
fixing deepspeed slow tests issue by @pacman100 in #604
add more conditions on casting by @younesbelkada in #606
Remove redundant .run in WandBTracker. by @zh-plus in #605
Fix some typos + wordings by @muellerzr in #603
reorg of test scripts and minor changes to tests by @pacman100 in #602
Move warning by @muellerzr in #598
Shorthand way to grab a tracker by @muellerzr in #594
Pin deepspeed by @muellerzr in #595
Improve docstring by @muellerzr in #591
TESTS! by @muellerzr in #589
Fix DispatchDataloader by @sgugger in #588
Use main_process_first in the examples by @muellerzr in #581
Skip and raise NotImplementedError for gather_for_metrics for now by @muellerzr in #580
minor FSDP launcher fix by @pacman100 in #579
Refine test in set_module_tensor_to_device by @sgugger in #577
Fix set_module_tensor_to_device by @sgugger in #576
Add 8 bit support - chapter II by @younesbelkada in #539
Fix tests, add wandb to gitignore by @muellerzr in #573
Fix step by @muellerzr in #572
Speed up main CI by @muellerzr in #571
ccl version check and import different module according to version by @sywangyi in #567
set default num_cpu_threads_per_process to improve oob performance by @sywangyi in #562
Add a tqdm helper by @muellerzr in #564
Rename actions to be a bit more accurate by @muellerzr in #568
Fix clean by @muellerzr in #569
enhancements and fixes for FSDP and DeepSpeed by @pacman100 in #532
fix: saving model weights by @csarron in #556
add on_main_process decorators by @ZhiyuanChen in #488
Update imports.py by @KimBioInfoStudio in #554
unpin datasets by @lhoestq in #563
Create good defaults in accelerate launch by @muellerzr in #553
Fix a few minor issues with example code in docs by @BenjaminBossan in #551
deepspeed version 0.6.7 fix by @pacman100 in #544
Rename test extras to testing by @muellerzr in #545
Add production testing + fix failing CI by @muellerzr in #547
Add a gather_for_metrics capability by @muellerzr in #540
Allow for kwargs to be passed to trackers by @muellerzr in #542
Add support for downcasting bf16 on TPUs by @muellerzr in #523
Add more documentation for device maps computations by @sgugger in #530
Restyle prepare one by @muellerzr in #531
Pick a better default for offload_state_dict by @sgugger in #529
fix some parameter setting does not work for CPU DDP and bf16 fail in… by @sywangyi in #527
Fix accelerate tests command by @sgugger in #528

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@sywangyi
- ccl version check and import different module according to version (#567)
- set default num_cpu_threads_per_process to improve oob performance (#562)
- fix some parameter setting does not work for CPU DDP and bf16 fail in… (#527)
@ZhiyuanChen
- add on_main_process decorators (#488)

accelerate 0.12.0 v0.12.0 New doc, gather_for_metrics, balanced device map and M1 support on Python PyPI