huggingface/accelerate v1.4.0 on GitHub

`torchao` FP8, initial Tensor Parallel support, and memory leak fixes

`torchao` FP8

This release introduces a new FP8 API and brings in a new backend: torchao. To use, pass in AORecipeKwargs to the Accelerator while setting mixed_precision="fp8". This is initial support, as it matures we will incorporate more into it (such as accelerate config/yaml) in future releases. See our benchmark examples here

TensorParallel

We have intial support for an in-house solution to TP when working with accelerate dataloaders. check out the PR here

Bug fixes

fix triton version check by @faaany in #3345
fix torch_dtype in estimate memory by @SunMarc in #3383
works for fp8 with deepspeed by @XiaobingSuper in #3361
[memory leak] Replace GradientState -> DataLoader reference with weakrefs by @tomaarsen in #3391

What's Changed

fix triton version check by @faaany in #3345
[tests] enable BNB test cases in tests/test_quantization.py on XPU by @faaany in #3349
[Dev] Update release directions by @muellerzr in #3352
[tests] make cuda-only test work on other hardware accelerators by @faaany in #3302
[tests] remove require_non_xpu test markers by @faaany in #3301
Support more functionalities for MUSA backend by @fmo-mt in #3359
[tests] enable more bnb tests on XPU by @faaany in #3350
feat: support tensor parallel & Data loader by @kmehant in #3173
DeepSpeed github repo move sync by @stas00 in #3376
[tests] Fix bnb cpu error by @faaany in #3351
fix torch_dtype in estimate memory by @SunMarc in #3383
works for fp8 with deepspeed by @XiaobingSuper in #3361
fix: typos in documentation files by @maximevtush in #3388
[examples] upgrade code for seed setting by @faaany in #3387
[memory leak] Replace GradientState -> DataLoader reference with weakrefs by @tomaarsen in #3391
add xpu check in get_quantized_model_device_map by @faaany in #3397
Torchao float8 training by @muellerzr in #3348

New Contributors

@kmehant made their first contribution in #3173
@XiaobingSuper made their first contribution in #3361
@maximevtush made their first contribution in #3388

Full Changelog: v1.3.0...v1.4.0

huggingface/accelerate v1.4.0 v1.4.0: `torchao` FP8, TP & dataLoader support, fix memory leak on GitHub

torchao FP8, initial Tensor Parallel support, and memory leak fixes

torchao FP8