github unslothai/unsloth v0.1.36-beta
Gemma 4 Fixes

8 hours ago

Hey everyone, we’ve updated Gemma 4 training and quants with many fixes. The bugs are universal and affected all packages and implementations and did not originate from Unsloth. We identified the bugs, fixed them, and Gemma 4 training now works properly only in Unsloth.

You need 8GB VRAM to train Gemma-4-E2B locally. Unsloth trains Gemma 4 ~1.5x faster with ~60% less VRAM than FA2 setups.

You can also train 26B-A4B and 31B or train via Unsloth Studio. Studio and the notebooks work for Vision, Text, Audio and inference.
For more details, guide + notebooks on training Gemma 4, view our blog: https://unsloth.ai/docs/models/gemma-4/train

Gemma 4 Training Fixes:

For fix details see our blog.

  1. Grad accumulation no longer causes losses to explode - before you might see losses of 300 to 400 - it should be 10 to 15 - Unsloth has this fixed.
  2. Index Error for 26B and 31B for inference - this will fail inference for 26B and 31B when using transformers - we fixed it.
  3. use_cache=False had gibberish for E2B, E4B - see huggingface/transformers#45242
  4. float16 audio -1e9 overflows on float16
Transformers vs Unsloth training loss

If you see losses higher than 13-15 (like 100 or 300) most likely gradient accumulation is not being accounted properly - we have fixed this as part of Unsloth and Unsloth Studio.

Gemma 4 Quant Re-uploads

We also updated our Gemma 4 GGUFs so you will need to re-download. Once again, the quant issues are not related to or originated from Unsloth:

  1. CUDA: check for buffer overlap before fusing - CRITICAL fixes <unused24> tokens ggml-org/llama.cpp#21566
  2. kv-cache : support attention rotation for heterogeneous iSWA ggml-org/llama.cpp#21513
  3. vocab : add byte token handling to BPE detokenizer for Gemma4 ggml-org/llama.cpp#21488
  4. convert : set "add bos" == True for Gemma 4 ggml-org/llama.cpp#21500
  5. common : add gemma 4 specialized parser ggml-org/llama.cpp#21418
  6. llama-model: read final_logit_softcapping for Gemma 4 ggml-org/llama.cpp#21390
  7. llama: add custom newline split for Gemma 4 ggml-org/llama.cpp#21406

Unsloth Studio Updates

  • Add speculative decoding support (ngram-mod, on by default)
  • Llama.cpp binaries updated to use latest version which includes all Gemma 4 Fixes
  • Fix Qwen3.5 and Gemma 4 training issues
  • Enable exporting and saving of Gemma 4 models
  • Harden sandbox security for terminal and python tools
  • Let recipes use the model loaded in Chat
  • Fix empty chat threads on navigation (and whenever switching tabs) and stabilize new chat flow
  • Allow non-LLM recipes to run and move Data tab first in executions
  • Reuse HF cached repo casing to prevent duplicate downloads

What's Changed

New Contributors

Full Changelog: v0.1.35-beta...v0.1.36-beta

Don't miss a new unsloth release

NewReleases is sending notifications on new releases.