unslothai/unsloth v0.1.35-beta on GitHub

Google releases Gemma 4 with four new models: E2B, E4B, 26B-A4B, 31B.

You can now run and train the Gemma 4 models in Unsloth. Guide / Blog: https://unsloth.ai/docs/models/gemma-4
Run E2B and E4B on 6GB RAM, and on phones. Run 26B-A4B and 31B on ~18GB.
GGUFs: https://huggingface.co/collections/unsloth/gemma-4

Updates

Tool calls for smaller models are now more stable and don't cut off anymore
Pre-compiled binaries for llama.cpp for 2 Gemma 4 fixes:
- vocab: fix Gemma4 tokenizer - (#21343)
- fix: gemma 4 template - (#21326)
Pre-compiled binaries for Windows, Linux, Mac, WSL devices - CPU and GPU
90% reduced HF API calls - less rate limits
Intel Mac works
All Gemma 4 models are re-converted.
Tool Calling more robust
Speculative Decoding added for non vision models (Gemma-4 is vision sadly and Qwen3.5)
Context length is now properly applied.
Tool calls for all models are now +30% to +80% more accurate.
Web search now actually gets web content and not just summaries
Number of tool calls allowed are increased to 25 from 10
Tool calls now terminate much better, so looping / repetitions will be reduced
More tool call healing and de-duplication logic to stop tool callings from leaking XML as well
Tested with unsloth/Qwen3.5-4B-GGUF (UD-Q4_K_XL), web search + code execution + thinking enabled.

Metric	Before	After
XML leaks in response	10/10	0/10
URL fetches used	0	4/10 runs
Runs with correct song names	0/10	2/10
Avg tool calls	5.5	3.8
Avg response time	12.3s	9.8s

Run Gemma 4 in Unsloth Studio:

What's Changed

studio: Polish Windows installer/setup logs by @Imagineer99 in #4736
feat: move folder management into model selector dropdown by @Shine1i in #4731
fix: clear tool status badge immediately after tool execution by @Shine1i in #4733
refactor flex attn to prefer flash if possible by @Datta0 in #4734
Fix Windows local GGUF model loading crash by @danielhanchen in #4730
Fix OOM model styling in Studio model selectors by @LeoBorcherding in #4738
feat(studio): strip org prefix in model search to surface unsloth variants by @rolandtannous in #4749
Fix forward compatibility with transformers 5.x by @danielhanchen in #4752
Architecture-aware KV cache VRAM estimation (5-path) by @danielhanchen in #4757
Fix save_pretrained_merged for full-finetuned models by @danielhanchen in #4755
Feat/prebuiltllamacpp by @mmathew23 in #4741
Add installer test coverage for prebuilt llama.cpp changes by @danielhanchen in #4756
fix: studio web search SSL failures and empty page content by @danielhanchen in #4754
fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ by @danielhanchen in #4748
fix(studio): allow context length slider to reach model's native limit by @danielhanchen in #4746
Tests for architecture-aware KV cache estimation by @danielhanchen in #4760
Fix custom llama.cpp source builds and macos metal source builds by @mmathew23 in #4762
studio: align composer/code, unify fonts, and remove tool collapse jitter by @Imagineer99 in #4763
fix(chat): correct loading text for cached models during inference by @AdamPlatin123 in #4764
fix(security): shell injection in GGML export conversion by @mateeaaaaaaa in #4768
Add regression test for shell injection fix in GGML conversion by @danielhanchen in #4773
fix(studio): prevent small models from stalling on tool-calling tasks by @danielhanchen in #4769
Add regression tests for custom llama prebuilt installer by @danielhanchen in #4772
Feat/custom llama prebuilt by @mmathew23 in #4771
studio: fix chat font changes leaking outside chat page by @Imagineer99 in #4775
feat(studio): display images from Python tool execution in chat UI by @danielhanchen in #4778
ui improvement by @rolandtannous in #4781
UI Changes by @danielhanchen in #4782
fix(studio): improve tool-calling re-prompt for small models by @danielhanchen in #4783
Pin Gemma-4 transformers requirement to 5.5.0 stable by @danielhanchen in #4784
Switch llama.cpp default to mainline ggml-org by @danielhanchen in #4785
Use transformers v5.5-release branch, pin to 5.5.0 by @danielhanchen in #4786
Fix: pin transformers==4.57.6 in main Studio venv by @danielhanchen in #4788
fix(studio): build llama.cpp from master for Gemma 4 support by @danielhanchen in #4790
fix name fixed name by @rolandtannous in #4791
fix(studio): prioritize curated defaults in Recommended model list by @danielhanchen in #4792
fix windows llama.cpp compile from source issue by @mmathew23 in #4793
fix(studio): pin llama.cpp to b8637 (Gemma 4 support) by @danielhanchen in #4796
fix(studio): don't set trust_remote_code for Gemma 4 training by @danielhanchen in #4795
fix(studio): revert llama.cpp default tag to latest by @danielhanchen in #4797
fix(studio): suppress fatal error when ggml-org has no prebuilt manifest by @danielhanchen in #4799

New Contributors

@AdamPlatin123 made their first contribution in #4764
@mateeaaaaaaa made their first contribution in #4768

Full Changelog: v0.1.3-beta...v0.1.35-beta

unslothai/unsloth v0.1.35-beta Google - Gemma 4 now in Unsloth! on GitHub

Updates

What's Changed

New Contributors

unslothai/unsloth v0.1.35-beta
Google - Gemma 4 now in Unsloth!

on GitHub