We're excited to support Vision models for RL and even more memory efficient + faster RL!
Unsloth now supports vision/multimodal RL with Gemma 3 and Qwen2.5-VL. Due to Unsloth's unique weight sharing and custom kernels, Unsloth makes VLM RL 1.5–2× faster, uses 90% less VRAM, and enables 10× longer context lengths than FA2 setups, with no accuracy loss. Qwen2.5-VL GSPO notebook
Gemma 3 (4B) Vision GSPO notebook
Full details in our blogpost: https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl
- This update also introduces Qwen's GSPO algorithm.
- Our new vision RL support also comes now even faster & more memory efficient! Our new kernels & algos allows faster RL for text and vision LLMs with 50% less VRAM & 10× more context.
- Introducing a new RL feature called 'Standby'. Before, RL requires GPU splitting between training & inference. With Unsloth Standby, you no longer have to & 'Unsloth Standby' uniquely limits speed degradation compared to other implementations and sometimes makes training even faster! Read our Blog

- We released Aider Polyglot benchmarks for our DeepSeek-V3.1 Dynamic GGUFs and Unsloth quants perform consistently better than others. Blog

Don't forget to also join our Reddit: r/unsloth 🥰
What's Changed
- GPT OSS Bug fixes by @danielhanchen in #3231
- tests for mxfp4 and quantized models merge fix unsloth zoo pr 254 by @rolandtannous in #3223
- Update mistral.py, showed flag to not call cut cross entropy by @pluesclues in #3233
- Remove old version constraint in dependency list by @timkpaine in #3237
- chore: Fix Typos by @DefiWimar7 in #3246
- Fix incorrect function call in test_qwen3_grpo.py by @stevenxdavis in #3212
- [Intel] make intel device support ROPE by @leizhenyuan in #3164
- Support saving locally in
model.save_pretrained_torchao
by @jerryzh168 in #3263 - fixed save_pretrained_torchao and associated tests by @rolandtannous in #3264
- patch sftrainer to disable _is_vlm by @mmathew23 in #3265
- Bug fixes by @danielhanchen in #3266
- Filter vllm executor log by @Datta0 in #3268
- llama vision inference fix by @mmathew23 in #3270
- Add TorchAO quantization tests with FP16 models and serialization workarounds by @rolandtannous in #3269
- GptAttention turn training off during inference by @mmathew23 in #3289
- Add support for QAT full fine-tuning by @andrewor14 in #3238
- simplify unsloth_base_fast_generate by @mmathew23 in #3291
- Bug fixes by @danielhanchen in #3295
- [ROCm] add hip device path by @billishyahao in #3301
- Bug fixes by @danielhanchen in #3322
- Add support for modules_to_save in FastModel.get_peft_model by @l1ghtsource in #3317
- Fast Inference with vLLM for VLMs by @Datta0 in #2975
- TRL Updated version of VLM GRPO update along with GSPO by @pluesclues in #3132
New Contributors
- @timkpaine made their first contribution in #3237
- @stevenxdavis made their first contribution in #3212
- @l1ghtsource made their first contribution in #3317
Full Changelog: August-2025-v2...September-2025-v2