Model quantization with bitsandbytes
You can now quantize any model (no just Transformer models) using Accelerate. This is mainly for models having a lot of linear layers. See the documentation for more information!
Support for Ascend NPUs
Accelerate now supports Ascend NPUs.
- Add Ascend NPU accelerator support by @statelesshz in #1676
What's new?
Accelerate now requires Python 3.8+ and PyTorch 1.10+ :
-
🚨🚨🚨 Spring cleaning: Python 3.8 🚨🚨🚨 by @muellerzr in #1661
-
🚨🚨🚨 Spring cleaning: PyTorch 1.10 🚨🚨🚨 by @muellerzr in #1662
-
Update launch.mdx by @LiamSwayne in #1553
-
Avoid double wrapping of all accelerate.prepare objects by @muellerzr in #1555
-
Update README.md by @LiamSwayne in #1556
-
Fix load_state_dict when there is one device and disk by @sgugger in #1557
-
Fix tests not being ran on multi-GPU nightly by @muellerzr in #1558
-
fix the typo when setting the "_accelerator_prepared" attribute by @Yura52 in #1560
-
[
core
] Fix possibility to passNoneType
objects inprepare
by @younesbelkada in #1561 -
Reset dataloader end_of_datalaoder at each iter by @sgugger in #1562
-
Update big_modeling.mdx by @LiamSwayne in #1564
-
[
bnb
] Fix failing int8 tests by @younesbelkada in #1567 -
Update gradient sync docs to reflect importance of
optimizer.step()
by @dleve123 in #1565 -
Update mixed precision integrations in README by @sgugger in #1569
-
Raise error instead of warn by @muellerzr in #1568
-
Introduce listify, fix tensorboard silently failing by @muellerzr in #1570
-
Check for bak and expand docs on directory structure by @muellerzr in #1571
-
Perminant solution by @muellerzr in #1577
-
fix the bug in xpu by @mingxiaoh in #1508
-
Make sure that we only set is_accelerator_prepared on items accelerate actually prepares by @muellerzr in #1578
-
Expand
prepare()
doc by @muellerzr in #1580 -
Get Torch version using importlib instead of pkg_resources by @catwell in #1585
-
improve oob performance when use mpirun to start DDP finetune without
accelerate launch
by @sywangyi in #1575 -
Update training_tpu.mdx by @LiamSwayne in #1582
-
Return false if CUDA available by @muellerzr in #1581
-
Fix test by @muellerzr in #1586
-
Update checkpoint.mdx by @LiamSwayne in #1587
-
FSDP updates by @pacman100 in #1576
-
Integration tests by @muellerzr in #1593
-
Add triggers for CI workflow by @muellerzr in #1597
-
Remove asking xpu plugin for non xpu devices by @abhilash1910 in #1594
-
reset end_of_dataloader for dataloader_dispatcher by @megavaz in #1609
-
fix for arc gpus by @abhilash1910 in #1615
-
Ignore low_zero option when only device is available by @sgugger in #1617
-
Fix failing multinode tests by @muellerzr in #1616
-
Fix tb issue by @muellerzr in #1623
-
Fix workflow by @muellerzr in #1625
-
Fix transformers sync bug with accumulate by @muellerzr in #1624
-
fix: Megatron is not installed. please build it from source. by @yuanwu2017 in #1636
-
deepspeed z2/z1 state_dict bloating fix by @pacman100 in #1638
-
Swap disable rich by @muellerzr in #1640
-
fix autocasting bug by @pacman100 in #1637
-
fix modeling low zero by @abhilash1910 in #1634
-
Add skorch to runners by @muellerzr in #1646
-
Change dispatch_model when we have only one device by @SunMarc in #1648
-
Check for port usage before launch by @muellerzr in #1656
-
[
BigModeling
] Add missing check for quantized models by @younesbelkada in #1652 -
Bump integration by @muellerzr in #1658
-
TIL by @muellerzr in #1657
-
docker cpu py version by @muellerzr in #1659
-
[
BigModeling
] Final fix for dispatch int8 and fp4 models by @younesbelkada in #1660 -
remove safetensor dep on shard_checkpoint by @SunMarc in #1664
-
change the import place to avoid import error by @pacman100 in #1653
-
Update broken Runhouse link in examples/README.md by @dongreenberg in #1668
-
Add docs for saving Transformers models by @deppen8 in #1671
-
Fix workflow CI by @muellerzr in #1690
-
update readme in examples by @statelesshz in #1678
-
Fix nightly tests by @muellerzr in #1696
-
Fixup docs by @muellerzr in #1697
-
Improve quality errors by @muellerzr in #1698
-
Move mixed precision wrapping ahead of DDP/FSDP wrapping by @ChenWu98 in #1682
-
Deepcopy on Accelerator to return self by @muellerzr in #1694
-
Skip tests when bnb isn't available by @muellerzr in #1706
-
Fix launcher validation by @abhilash1910 in #1705
-
Fixes for issue #1683: failed to run accelerate config in colab by @Erickrus in #1692
-
Fix the bug where DataLoaderDispatcher gets stuck in an infinite wait when the dataset is an IterDataPipe during multi-process training. by @yuxinyuan in #1709
-
Keep old behavior by @muellerzr in #1716
-
Optimize
get_scale
to reduce async calls by @muellerzr in #1718 -
Remove duplicate code by @muellerzr in #1717
-
New tactic by @muellerzr in #1719
-
add Comfy-UI by @pacman100 in #1723
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @LiamSwayne
- @mingxiaoh
- fix the bug in xpu (#1508)
- @statelesshz
- @ChenWu98
- Move mixed precision wrapping ahead of DDP/FSDP wrapping (#1682)