Trackio tracker support

We've added support for a trackio, lightweight, 💯 free experiment tracking Python library built on top of 🤗 Datasets and Spaces.

Main features are:

Local-first design: dashboard runs locally by default. You can also host it on Spaces by specifying a space_id.
Persists logs locally (or in a private Hugging Face Dataset)
Visualize experiments with a Gradio dashboard locally (or on Hugging Face Spaces)
Everything here, including hosting on Hugging Faces, is free!

To use it with accelerate, you need to set log_with and initialize the trackers

accelerator = Accelerator(log_with="trackio")
config={"learning_rate": 0.001, "batch_size": 32}
# init_kwargs in order to host the dashboard on spaces
init_kwargs = {"trackio": {"space_id": "hf_username/space_name"}
accelerator.init_trackers("example_project", config=config, init_kwargs=init_kwargs})

Thanks @pcuenca for the integration !

trackio by @pcuenca in #3669

Model loading speedup when relying `set_module_tensor_to_device`

Setting tensor while clearing cache is very slow, so we added clear_device option to disable it.
Another small optimization is using non_blocking everywhere and syncing just before returning control to the user. This makes the loading slightly faster.

Speedup model loading by 4-5x in Diffusers ⚡ by @a-r-r-o-w in #3674

FDSP, Deepspeed, FP8 minor improvements

Add support for e5e2 and default to hybrid when launcher is used by @IlyasMoutawwakil in #3640
Fix FP8 tests, enable FP8 to be used without direct Accelerator() configuring by @pstjohn in #3677
Bunch of FSDP improvements by @S1ro1 in #3671
Fix: properly error when DDP + Dtensor model by @S1ro1 in #3629
Fix fsdp2 example typo by @shimizust in #3657
Added a check in no_sync() to avoid errors when using deepspeed zero2/3 by @xliu0105 in #3656

🚨🚨🚨 Breaking changes 🚨🚨🚨

find_executable_batch_size() will no longer halves the batch after every OOM. Instead, we will multiply the batch size by 0.9. This should help user not waste gpu capacity.

“Stop Halving My Batch!” · Default back-off 0.5 → 0.9 by @SunMarc in #3684

What's Changed

[typo] shards instead of shard by @SunMarc in #3645
Docs: Fix typos in gradient accumulation guide by @kilavvy in #3649
xpu enablement on left cases by @yao-matrix in #3654
unpin datasets in examples requirements by @SunMarc in #3681
fix: wandb config not saved in offline mode by @ved1beta in #3648
accelerate/data_loader.py: do not yield if the base_dataloader is empty by @0xnightwind in #3659
warn for invalid keys by @ved1beta in #3613
Update Gaudi runner image to latest SynapseAI and enable previously disabled tests by @IlyasMoutawwakil in #3653

New Contributors

@kilavvy made their first contribution in #3649
@shimizust made their first contribution in #3657
@xliu0105 made their first contribution in #3656
@0xnightwind made their first contribution in #3659

Full Changelog: v1.8.1...v1.9.0

huggingface/accelerate v1.9.0 v1.9.0: Trackio support, Model loading speedup, Minor distributed improvements on GitHub