Trackio tracker support
We've added support for a trackio, lightweight, π― free experiment tracking Python library built on top of π€ Datasets and Spaces.
Main features are:
- Local-first design: dashboard runs locally by default. You can also host it on Spaces by specifying a
space_id. - Persists logs locally (or in a private Hugging Face Dataset)
- Visualize experiments with a Gradio dashboard locally (or on Hugging Face Spaces)
- Everything here, including hosting on Hugging Faces, is free!
To use it with accelerate, you need to set log_with and initialize the trackers
accelerator = Accelerator(log_with="trackio")
config={"learning_rate": 0.001, "batch_size": 32}
# init_kwargs in order to host the dashboard on spaces
init_kwargs = {"trackio": {"space_id": "hf_username/space_name"}
accelerator.init_trackers("example_project", config=config, init_kwargs=init_kwargs})Thanks @pcuenca for the integration !
Model loading speedup when relying set_module_tensor_to_device
Setting tensor while clearing cache is very slow, so we added clear_device option to disable it.
Another small optimization is using non_blocking everywhere and syncing just before returning control to the user. This makes the loading slightly faster.
- Speedup model loading by 4-5x in Diffusers β‘ by @a-r-r-o-w in #3674
FDSP, Deepspeed, FP8 minor improvements
- Add support for e5e2 and default to hybrid when launcher is used by @IlyasMoutawwakil in #3640
- Fix FP8 tests, enable FP8 to be used without direct
Accelerator()configuring by @pstjohn in #3677 - Bunch of FSDP improvements by @S1ro1 in #3671
- Fix: properly error when DDP + Dtensor model by @S1ro1 in #3629
- Fix fsdp2 example typo by @shimizust in #3657
- Added a check in no_sync() to avoid errors when using deepspeed zero2/3 by @xliu0105 in #3656
π¨π¨π¨ Breaking changes π¨π¨π¨
find_executable_batch_size() will no longer halves the batch after every OOM. Instead, we will multiply the batch size by 0.9. This should help user not waste gpu capacity.
What's Changed
- [typo] shards instead of shard by @SunMarc in #3645
- Docs: Fix typos in gradient accumulation guide by @kilavvy in #3649
- xpu enablement on left cases by @yao-matrix in #3654
- unpin datasets in examples requirements by @SunMarc in #3681
- fix: wandb config not saved in offline mode by @ved1beta in #3648
- accelerate/data_loader.py: do not yield if the base_dataloader is empty by @0xnightwind in #3659
- warn for invalid keys by @ved1beta in #3613
- Update Gaudi runner image to latest SynapseAI and enable previously disabled tests by @IlyasMoutawwakil in #3653
New Contributors
- @kilavvy made their first contribution in #3649
- @shimizust made their first contribution in #3657
- @xliu0105 made their first contribution in #3656
- @0xnightwind made their first contribution in #3659
Full Changelog: v1.8.1...v1.9.0
