Hey everyone, we’ve updated Gemma 4 training and quants with many fixes. The bugs are universal and affected all packages and implementations and did not originate from Unsloth. We identified the bugs, fixed them, and Gemma 4 training now works properly only in Unsloth.
You need 8GB VRAM to train Gemma-4-E2B locally. Unsloth trains Gemma 4 ~1.5x faster with ~60% less VRAM than FA2 setups.
You can also train 26B-A4B and 31B or train via Unsloth Studio. Studio and the notebooks work for Vision, Text, Audio and inference.
For more details, guide + notebooks on training Gemma 4, view our blog: https://unsloth.ai/docs/models/gemma-4/train
Gemma 4 Training Fixes:
- Grad accumulation no longer causes losses to explode - before you might see losses of 300 to 400 - it should be 10 to 15 - Unsloth has this fixed.
- Index Error for 26B and 31B for inference - this will fail inference for 26B and 31B when using transformers - we fixed it.
use_cache=Falsehad gibberish for E2B, E4B - see huggingface/transformers#45242- float16 audio -1e9 overflows on float16
If you see losses higher than 13-15 (like 100 or 300) most likely gradient accumulation is not being accounted properly - we have fixed this as part of Unsloth and Unsloth Studio.
Gemma 4 Quant Re-uploads
We also updated our Gemma 4 GGUFs so you will need to re-download. Once again, the quant issues are not related to or originated from Unsloth:
- CUDA: check for buffer overlap before fusing - CRITICAL fixes
<unused24> tokensggml-org/llama.cpp#21566 - kv-cache : support attention rotation for heterogeneous iSWA ggml-org/llama.cpp#21513
- vocab : add byte token handling to BPE detokenizer for Gemma4 ggml-org/llama.cpp#21488
- convert : set "add bos" == True for Gemma 4 ggml-org/llama.cpp#21500
- common : add gemma 4 specialized parser ggml-org/llama.cpp#21418
- llama-model: read final_logit_softcapping for Gemma 4 ggml-org/llama.cpp#21390
- llama: add custom newline split for Gemma 4 ggml-org/llama.cpp#21406
Unsloth Studio Updates
- Add speculative decoding support (ngram-mod, on by default)
- Llama.cpp binaries updated to use latest version which includes all Gemma 4 Fixes
- Fix Qwen3.5 and Gemma 4 training issues
- Enable exporting and saving of Gemma 4 models
- Harden sandbox security for terminal and python tools
- Let recipes use the model loaded in Chat
- Fix empty chat threads on navigation (and whenever switching tabs) and stabilize new chat flow
- Allow non-LLM recipes to run and move Data tab first in executions
- Reuse HF cached repo casing to prevent duplicate downloads
What's Changed
- fix(studio): lazy-import transformers in model_config to fix 5.x version switch by @rolandtannous in #4806
- fix: patch PEFT for Gemma4ClippableLinear in loader checkpoint path (fixes export) by @rolandtannous in #4807
- Fix/gemma4 install script by @Manan17 in #4815
- Fix/llama.cppbuilding by @mmathew23 in #4804
- Add tests for simplified llama.cpp install policy (from PR #4804) by @danielhanchen in #4817
- Differentiate web search and URL fetch in chat tool UI by @Shine1i in #4802
- Allow non-LLM recipes to run and move Data tab first in executions by @Shine1i in #4805
- studio: reuse HF cached repo casing to prevent duplicate downloads by @Imagineer99 in #4822
- fix(studio): ensure first chat tool call starts in session sandbox by @neodon in #4810
- fix(studio): harden sandbox security for terminal and python tools by @danielhanchen in #4827
- studio: add speculative decoding support (ngram-mod, on by default) by @danielhanchen in #4836
- Add Gemma 4 model sampling defaults by @danielhanchen in #4838
- Add tests for cache case resolution (from PR #4822) by @danielhanchen in #4823
- Bump minimum unsloth version to 2026.4.2 in install scripts by @danielhanchen in #4842
- Fix/studio colab button message: Add fallback message for Colab Studio button when proxy URL fails by @LeoBorcherding in #4866
- [Studio][Optimization]Add vision detection cache to is_vision_model() by @rolandtannous in #4853
- Add tests for is_vision_model() caching behaviour by @danielhanchen in #4855
- Remove Gemma-4 from FORCE_FLOAT32 by @danielhanchen in #4875
- fix: skip redundant HfFileSystem().glob() calls in loader.py by @rolandtannous in #4852
- fix(studio): custom folder scan fails to find GGUF variants when pointing directly at a model directory by @JYYYYYT in #4860
- Add unit tests for loader glob skip guard (from PR #4852) by @danielhanchen in #4854
- Studio: Fix empty chat threads on navigation and stabilize new chat flow by @Imagineer99 in #4872
- Bump minimum unsloth version to 2026.4.4 in install scripts by @danielhanchen in #4876
- split venv_t5 into tiered 5.3.0/5.5.0 and fix trust_remote_code by @rolandtannous in #4878
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #4879
- build(deps): bump oxc-parser from 0.121.0 to 0.123.0 in /studio/backend/core/data_recipe/oxc-validator in the npm-oxc-validator group by @dependabot[bot] in #4776
- Update dependabot.yml by @danielhanchen in #4915
- Let recipes use the model loaded in Chat by @Shine1i in #4840
- build(deps): bump the bun-frontend group across 1 directory with 16 updates by @dependabot[bot] in #4586
New Contributors
Full Changelog: v0.1.35-beta...v0.1.36-beta