We’re getting close to our final release of 2025! Thanks so much for sticking with us this year. We’ve got lots of new features so please update Unsloth to use the latest updates! 🦥
- Introducing FP8 Reinforcement Learning in Unsloth! Train on any FP8 supported GPU and get 1.4x faster with 60% less VRAM: Read our Blog/Guide • Notebooks: Qwen3-8B FP8 GRPO and Llama-3.2-1B FP8 GRPO
- You may notice Unsloth now uses much less VRAM than before, enabling even longer context. We’re also implementing faster training very soon and we’ll share all the details in an upcoming blog.
- DeepSeek-OCR fine-tuning is here! We fine-tuned DeepSeek-OCR, improving its language understanding by 89%. Read our Blog • Free notebook
- Qwen3-VL models supported including GGUFs to run locally: Blogpost + fixes • GGUFs
- We analyzed RL training-inference mismatch for FP16 vs. BF16 and concluded that Unsloth does not have this issue: Analysis and Results
- We’ve partnered with Docker to let you run LLMs locally with zero setup. Docker GGUFs are now powered by Unsloth Dynamic.
Example:docker model run hf.co/unsloth/gpt-oss-20b-GGUF:F16Read guide - Baidu ERNIE models are now supported. Notebooks coming soon.
- Unsloth now supports SGLang. Read our guide
- We wrote guides for LoRA Hot Swapping and vLLM Engine Arguments
- Run Kimi-K2-Thinking the most powerful open model locally. Kimi-K2 Guide
- Lots of bug fixes! See further below.
Tip
Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo
Bug Fixes and Enhancements
- Supports
trl>=0.25.0andvllm>=0.11.2andtransformers>=4.57.1 - Fixed gpt-oss GRPO, RL excessive re-compilations on
torch>=2.9.0 - Fixes Sleep mode and reduces memory usage by 5 to 15% further for RL, GRPO
- Fix propagation of
trust_remote_code = True - Fix Unsloth offloaded gradient checkpointing not offloading on 1st step - reduces VRAM by >20%
- Add
logits.detach()to GRPO to solve double backwards on some pathways - Add
int64kernels & fixed RoPE embeddings to allow super ultra long context training - Fixed 📓 OpenEnv gpt-oss RL notebook
- DGX Spark docker image fixed
What's Changed
- Grpo gradient accumulation edits by @pluesclues in #3390
- Nightly by @danielhanchen in #3532
- Handle TRL version compatibility in rl_replacements.py by @pluesclues in #3540
- Bug fixes by @danielhanchen in #3546
- Sleep trl patch by @Datta0 in #3517
- Detach logits before returning from function by @pluesclues in #3554
- Fix typos in comment by @mk0walsk in #3557
- Formatting & bug fixes by @danielhanchen in #3563
- DeepseekOCR: add trust_remote_code kwarg by @mmathew23 in #3564
- pre-commit CI config by @djsaunde in #3565
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3576
- Resize rope embeddings for long sequence training by @mmathew23 in #3586
- Patch in tiled mlp by @mmathew23 in #3584
- Support for out-of-source quantizers by @Giuseppe5 in #3534
- Fix: prevent rope_embedding AssertionError by checking kv_seq_len before reuse by @jarrycyx in #3578
- Extend TorchAOConfig to support mobile usecases by @metascroy in #3587
- fix qwen3 vl gradient accumulation by @mmathew23 in #3598
- Do not force set beta to 0 for DAPO by @Datta0 in #3604
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3606
- Fix broken links and typo in README by @mk0walsk in #3611
- remove pre-commit workflow (covered by pre-commit app) by @djsaunde in #3618
- Add an int64 path for mlp kernels by @mmathew23 in #3614
- Remove grpo requirement bs=num_generations by @mmathew23 in #3609
- Enable FP8 + RL training for bf16 models by @andrewor14 in #3440
- Fix/save torchao model loading logic by @rolandtannous in #3621
- Fix LlamaModel_fast_forward signature to match HF Transformers (Support inputs_embeds) by @MercuryYen in #3623
- Add 128x128 PerBlock FP8 + RL by @andrewor14 in #3629
- Add trust_remote_code parameter to tokenizer by @Etherll in #3631
- [intel] change windows to remove windows-triton for intel xpu by @leizhenyuan in #3168
Unsloth Zoo Changes
- Bug fixes by @danielhanchen in unslothai/unsloth-zoo#327
- Fix GRPO by @danielhanchen in unslothai/unsloth-zoo#328
- fix gpt oss memory calculation for intel device by @leizhenyuan in unslothai/unsloth-zoo#330
- Bug fixes by @danielhanchen in unslothai/unsloth-zoo#331
- Bug fixes by @danielhanchen in unslothai/unsloth-zoo#332
- fixed unbound local error tokenizer-model from cache by @rolandtannous in unslothai/unsloth-zoo#333
- Now it works on a uv venv by @kittawere in unslothai/unsloth-zoo#336
- Gemma3n fix by @mmathew23 in unslothai/unsloth-zoo#338
- [Intel] remove triton windows for intel by @leizhenyuan in unslothai/unsloth-zoo#243
- FP8 training enhancements by @Datta0 in unslothai/unsloth-zoo#337
- GRPO gradient accumulation steps update and DAPO support by @pluesclues in unslothai/unsloth-zoo#308
- Fix/video collate by @mmathew23 in unslothai/unsloth-zoo#342
- Bug fixes by @danielhanchen in unslothai/unsloth-zoo#344
- FP8, Standby and vLLM updates by @Datta0 in unslothai/unsloth-zoo#340
- Put importance sampling into no grad by @pluesclues in unslothai/unsloth-zoo#343
- Detach hidden states to avoid gradient carry by @pluesclues in unslothai/unsloth-zoo#345
- Bug fixes by @danielhanchen in unslothai/unsloth-zoo#347
- MoE: Cast routing_weights dtype correctly by @mmathew23 in unslothai/unsloth-zoo#349
- return local model in determine_base_model_source with any quantization by @noah1510 in unslothai/unsloth-zoo#334
- Enable FP8 + RL training by @andrewor14 in unslothai/unsloth-zoo#351
- Tiled MLP Implementation by @mmathew23 in unslothai/unsloth-zoo#350
- Fix gradient checkpointing layer caller kwargs by @mmathew23 in unslothai/unsloth-zoo#353
- vLLM weight scale FP8 and standby override by @Datta0 in unslothai/unsloth-zoo#354
- Fix docstring removing regex to support empty parentheses by @noisycat3 in unslothai/unsloth-zoo#360
Unsloth Notebooks Changes
- Feat/qwen3 vl by @Erland366 in unslothai/notebooks#119
- Feat/double footer fix by @Erland366 in unslothai/notebooks#121
- Add GGUF section for Qwen3-VL by @Etherll in unslothai/notebooks#123
- Fix TypeError in unsloth_push_to_hub_gguf() when pushing GGUF model to Hugging Face by @samanta-sc in unslothai/notebooks#125
- fix TorchAOConfig' object has no attribute 'base_config' error by @rolandtannous in unslothai/notebooks#129
- Updated Dockerfile for DGX Spark by @sameersegal in unslothai/notebooks#133
- gemma3-270m: reduce batch size for sample packing by @djsaunde in unslothai/notebooks#135
- fix dataset formatting and mapping for Magistral reasoning by @rolandtannous in unslothai/notebooks#136
- fix magistral inference by @rolandtannous in unslothai/notebooks#138
Full Changelog: October-2025...November-2025
What's Changed
- Grpo gradient accumulation edits by @pluesclues in #3390
- Nightly by @danielhanchen in #3532
- Handle TRL version compatibility in rl_replacements.py by @pluesclues in #3540
- Bug fixes by @danielhanchen in #3546
- Sleep trl patch by @Datta0 in #3517
- Detach logits before returning from function by @pluesclues in #3554
- Fix typos in comment by @mk0walsk in #3557
- Formatting & bug fixes by @danielhanchen in #3563
- DeepseekOCR: add trust_remote_code kwarg by @mmathew23 in #3564
- pre-commit CI config by @djsaunde in #3565
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3576
- Resize rope embeddings for long sequence training by @mmathew23 in #3586
- Patch in tiled mlp by @mmathew23 in #3584
- Support for out-of-source quantizers by @Giuseppe5 in #3534
- Fix: prevent rope_embedding AssertionError by checking kv_seq_len before reuse by @jarrycyx in #3578
- Extend TorchAOConfig to support mobile usecases by @metascroy in #3587
- fix qwen3 vl gradient accumulation by @mmathew23 in #3598
- Do not force set beta to 0 for DAPO by @Datta0 in #3604
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3606
- Fix broken links and typo in README by @mk0walsk in #3611
- remove pre-commit workflow (covered by pre-commit app) by @djsaunde in #3618
- Add an int64 path for mlp kernels by @mmathew23 in #3614
- Remove grpo requirement bs=num_generations by @mmathew23 in #3609
- Enable FP8 + RL training for bf16 models by @andrewor14 in #3440
- Fix/save torchao model loading logic by @rolandtannous in #3621
- Fix LlamaModel_fast_forward signature to match HF Transformers (Support inputs_embeds) by @MercuryYen in #3623
- Add 128x128 PerBlock FP8 + RL by @andrewor14 in #3629
- Add trust_remote_code parameter to tokenizer by @Etherll in #3631
- [intel] change windows to remove windows-triton for intel xpu by @leizhenyuan in #3168
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3634
- Float8 GRPO, RL by @danielhanchen in #3640
New Contributors
- @mk0walsk made their first contribution in #3557
- @pre-commit-ci[bot] made their first contribution in #3576
- @Giuseppe5 made their first contribution in #3534
- @jarrycyx made their first contribution in #3578
- @MercuryYen made their first contribution in #3623
Full Changelog: October-2025...November-2025