unslothai/unsloth October-2024 on GitHub

We fixed a gradient accumulation bug which was actually discovered since 2021 here, and rediscovered here. Read more in our blog post: https://unsloth.ai/blog/gradient

We have a Colab Notebook for Llama 3.2 using the fixed trainer and a Kaggle Notebook as well.

Essentially theoretically bsz * ga should be equivalent to full batch training with no gradient accumulation, but weirdly the training losses do no match up:

We fixed it in Unsloth!

To use Unsloth's fixed trainer with gradient accumulation, use:

from unsloth import unsloth_train
# trainer_stats = trainer.train() << Buggy if using gradient accumulation
trainer_stats = unsloth_train(trainer) # << Fixed gradient accumulation

Please update Unsloth on local machines (no need for Colab / Kaggle) via:

pip uninstall unsloth -y
pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

Read our blog post: https://unsloth.ai/blog/gradient for more details!

What's Changed

Llama 3.2 by @danielhanchen in #1058
Fix merges by @danielhanchen in #1079
Handle absolute paths for save_to_gguf using pathlib by @giuliabaldini in #1120
Only remove folder in sentencepiece check if it was created by @giuliabaldini in #1121
Gradient Accumulation Fix by @danielhanchen in #1134

New Contributors

@giuliabaldini made their first contribution in #1120

Full Changelog: September-2024...October-2024

unslothai/unsloth October-2024 Gradient Accumulation Fix on GitHub

What's Changed

New Contributors

unslothai/unsloth October-2024
Gradient Accumulation Fix

on GitHub