
gpt-oss is here! ✨
Finetune gpt-oss for free with our Unsloth Colab notebook!
- We’ve managed to make gpt-oss train on just 14GB of VRAM, making it possible to work on free Colab due to our linear conversions. For more details, Read our Guide/Blogpost
- Fine-tuning gpt-oss is 1.5x faster and uses 50% less VRAM with Unsloth. gpt-oss-120b model fits on 65GB of VRAM.
- Model uploads: 20b GGUF • 120b GGUF • All uploads
🦥 Unsloth updates
- We’ve made algorithmic updates to Unsloth so every model now trains faster and with less VRAM, no matter which.
- Unsloth now works on RTX 50 and Blackwell GPUs. Read our guide.
- Official Unsloth Docker image coming very soon!
- You can now run Unsloth models directly via Docker:
docker model pull hf.co/unsloth/gpt-oss-20b-GGUF
🌠 Qwen3-Coder + Qwen3-2507
Qwen made July, 2025 updates called 'Qwen3-2507' and launched their SOTA coding models!
- Qwen3-Coder (with Unsloth fixes): Guide • Coder uploads
- Qwen3-2507: Guide • 2507 uploads
- Fine-tune Qwen3-4B-2507 with our Colab notebook
🔮 New models + Support:
Run these new models:
- Kimi-K2: Guide • GGUF
- GLM: 4.5-Air • 4.5 • 4-32B-0414
- Orpheus-3B • Hunyuan-A13B
Unsloth also now supports running + training for:
- We collabed with the Liquid & TII teams to support training for Falcon-H1-7B and LFM2-1.2B!
- Devstral-2507 • Magistral-2507 • SmolLM3-3B
Don't forget to also join our Reddit: r/unsloth 🥰
What's Changed
- Fix argument mismatch in GRPO _get_per_token_logps lambda function by @rolandtannous in #2929
- patch falcon h1 inference by @mmathew23 in #2932
- Fix falcon H1 dropout issue by @Datta0 in #2938
- fix: change lora_dropout from int to float for type consistency by @muzzlol in #2949
- GRPO fix dataloader_num_workers value error in GRPOTrainer by @rolandtannous in #2944
- GRPO Fix - Support vllm pre-dequantized quantization states in fast_dequantize kernel by @rolandtannous in #2943
- Bug fixes by @danielhanchen in #2982
- Update unsloth-cli.py by @qgallouedec in #2985
- use fastmodel falcon h1 by @mmathew23 in #2987
- Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model merge error by @rolandtannous in #2986
- Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model merge error" by @danielhanchen in #2988
- Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized … by @danielhanchen in #2990
- Bug fixes by @danielhanchen in #2998
- Update README.md by @qgallouedec in #2991
- Bug fixes by @danielhanchen in #3017
- [bugs] fix for casual mask by @leizhenyuan in #3011
- [intel] add for intel path for llama.py by @leizhenyuan in #3012
- Fix Gemma 2 by @danielhanchen in #3024
- falcon h1 force float32 when dtype is torch.float16 by @mmathew23 in #3026
- Fix torch compile issues by @danielhanchen in #3028
- Fix Llama and Gemma inference by @Erland366 in #3034
- Fixup multi GPU workload. by @Datta0 in #3049
- Bug Fixes and Enhancements for Model Loading by @Etherll in #3052
- Add gemma-3n chat template to chat_templates.py by @Etherll in #3051
- Fix: Added specific check for Gemma so models like BERT properly init… by @Sekinal in #3055
- fixup rope sync for everything by @Datta0 in #3061
- get_per_token_logps_and_entropies: return tuple instead of dict by @mmathew23 in #3080
- Docs: Add WSL Installation Guide for Blackwell / RTX 5090 GPU by @dongbin-lunark in #3079
- GPT-OSS support by @mmathew23 in #3099
- Nightly by @danielhanchen in #3102
- gpt-oss manually call temporary patch by @mmathew23 in #3104
New Contributors
- @muzzlol made their first contribution in #2949
- @Sekinal made their first contribution in #3055
- @dongbin-lunark made their first contribution in #3079
Full Changelog: July-2025...August-2025