unslothai/unsloth August-2025 on GitHub

gpt-oss is here! ✨

Finetune gpt-oss for free with our Unsloth Colab notebook!

We’ve managed to make gpt-oss train on just 14GB of VRAM, making it possible to work on free Colab due to our linear conversions. For more details, Read our Guide/Blogpost
Fine-tuning gpt-oss is 1.5x faster and uses 50% less VRAM with Unsloth. gpt-oss-120b model fits on 65GB of VRAM.
Model uploads: 20b GGUF • 120b GGUF • All uploads

🦥 Unsloth updates

We’ve made algorithmic updates to Unsloth so every model now trains faster and with less VRAM, no matter which.
Unsloth now works on RTX 50 and Blackwell GPUs. Read our guide.
Official Unsloth Docker image coming very soon!
You can now run Unsloth models directly via Docker: docker model pull hf.co/unsloth/gpt-oss-20b-GGUF

🌠 Qwen3-Coder + Qwen3-2507

Qwen made July, 2025 updates called 'Qwen3-2507' and launched their SOTA coding models!

Qwen3-Coder (with Unsloth fixes): Guide • Coder uploads
Qwen3-2507: Guide • 2507 uploads
Fine-tune Qwen3-4B-2507 with our Colab notebook

🔮 New models + Support:

Run these new models:

Kimi-K2: Guide • GGUF
GLM: 4.5-Air • 4.5 • 4-32B-0414
Orpheus-3B • Hunyuan-A13B

Unsloth also now supports running + training for:

We collabed with the Liquid & TII teams to support training for Falcon-H1-7B and LFM2-1.2B!
Devstral-2507 • Magistral-2507 • SmolLM3-3B

Don't forget to also join our Reddit: r/unsloth 🥰

What's Changed

Fix argument mismatch in GRPO _get_per_token_logps lambda function by @rolandtannous in #2929
patch falcon h1 inference by @mmathew23 in #2932
Fix falcon H1 dropout issue by @Datta0 in #2938
fix: change lora_dropout from int to float for type consistency by @muzzlol in #2949
GRPO fix dataloader_num_workers value error in GRPOTrainer by @rolandtannous in #2944
GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel by @rolandtannous in #2943
Bug fixes by @danielhanchen in #2982
Update unsloth-cli.py by @qgallouedec in #2985
use fastmodel falcon h1 by @mmathew23 in #2987
Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model merge error by @rolandtannous in #2986
Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model merge error" by @danielhanchen in #2988
Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized … by @danielhanchen in #2990
Bug fixes by @danielhanchen in #2998
Update README.md by @qgallouedec in #2991
Bug fixes by @danielhanchen in #3017
[bugs] fix for casual mask by @leizhenyuan in #3011
[intel] add for intel path for llama.py by @leizhenyuan in #3012
Fix Gemma 2 by @danielhanchen in #3024
falcon h1 force float32 when dtype is torch.float16 by @mmathew23 in #3026
Fix torch compile issues by @danielhanchen in #3028
Fix Llama and Gemma inference by @Erland366 in #3034
Fixup multi GPU workload. by @Datta0 in #3049
Bug Fixes and Enhancements for Model Loading by @Etherll in #3052
Add gemma-3n chat template to chat_templates.py by @Etherll in #3051
Fix: Added specific check for Gemma so models like BERT properly init… by @Sekinal in #3055
fixup rope sync for everything by @Datta0 in #3061
get_per_token_logps_and_entropies: return tuple instead of dict by @mmathew23 in #3080
Docs: Add WSL Installation Guide for Blackwell / RTX 5090 GPU by @dongbin-lunark in #3079
GPT-OSS support by @mmathew23 in #3099
Nightly by @danielhanchen in #3102
gpt-oss manually call temporary patch by @mmathew23 in #3104

New Contributors

@muzzlol made their first contribution in #2949
@Sekinal made their first contribution in #3055
@dongbin-lunark made their first contribution in #3079

Full Changelog: July-2025...August-2025

unslothai/unsloth August-2025 gpt-oss Fine-tuning on GitHub