We’re getting close to our final release of 2025! Thanks so much for sticking with us this year. We’ve got lots of new features so please update Unsloth to use the latest updates! 🦥

Introducing FP8 Reinforcement Learning in Unsloth! Train on any FP8 supported GPU and get 1.4x faster with 60% less VRAM: Read our Blog/Guide • Notebooks: Qwen3-8B FP8 GRPO and Llama-3.2-1B FP8 GRPO

You may notice Unsloth now uses much less VRAM than before, enabling even longer context. We’re also implementing faster training very soon and we’ll share all the details in an upcoming blog.
DeepSeek-OCR fine-tuning is here! We fine-tuned DeepSeek-OCR, improving its language understanding by 89%. Read our Blog • Free notebook
Qwen3-VL models supported including GGUFs to run locally: Blogpost + fixes • GGUFs
We analyzed RL training-inference mismatch for FP16 vs. BF16 and concluded that Unsloth does not have this issue: Analysis and Results
We’ve partnered with Docker to let you run LLMs locally with zero setup. Docker GGUFs are now powered by Unsloth Dynamic.
Example: docker model run hf.co/unsloth/gpt-oss-20b-GGUF:F16 Read guide
Baidu ERNIE models are now supported. Notebooks coming soon.
Unsloth now supports SGLang. Read our guide
We wrote guides for LoRA Hot Swapping and vLLM Engine Arguments
Run Kimi-K2-Thinking the most powerful open model locally. Kimi-K2 Guide
Lots of bug fixes! See further below.

Tip

Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo

Bug Fixes and Enhancements

Supports trl>=0.25.0 and vllm>=0.11.2 and transformers>=4.57.1
Fixed gpt-oss GRPO, RL excessive re-compilations on torch>=2.9.0
Fixes Sleep mode and reduces memory usage by 5 to 15% further for RL, GRPO
Fix propagation of trust_remote_code = True
Fix Unsloth offloaded gradient checkpointing not offloading on 1st step - reduces VRAM by >20%
Add logits.detach() to GRPO to solve double backwards on some pathways
Add int64 kernels & fixed RoPE embeddings to allow super ultra long context training
Fixed 📓 OpenEnv gpt-oss RL notebook
DGX Spark docker image fixed

What's Changed

Grpo gradient accumulation edits by @pluesclues in #3390
Nightly by @danielhanchen in #3532
Handle TRL version compatibility in rl_replacements.py by @pluesclues in #3540
Bug fixes by @danielhanchen in #3546
Sleep trl patch by @Datta0 in #3517
Detach logits before returning from function by @pluesclues in #3554
Fix typos in comment by @mk0walsk in #3557
Formatting & bug fixes by @danielhanchen in #3563
DeepseekOCR: add trust_remote_code kwarg by @mmathew23 in #3564
pre-commit CI config by @djsaunde in #3565
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3576
Resize rope embeddings for long sequence training by @mmathew23 in #3586
Patch in tiled mlp by @mmathew23 in #3584
Support for out-of-source quantizers by @Giuseppe5 in #3534
Fix: prevent rope_embedding AssertionError by checking kv_seq_len before reuse by @jarrycyx in #3578
Extend TorchAOConfig to support mobile usecases by @metascroy in #3587
fix qwen3 vl gradient accumulation by @mmathew23 in #3598
Do not force set beta to 0 for DAPO by @Datta0 in #3604
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3606
Fix broken links and typo in README by @mk0walsk in #3611
remove pre-commit workflow (covered by pre-commit app) by @djsaunde in #3618
Add an int64 path for mlp kernels by @mmathew23 in #3614
Remove grpo requirement bs=num_generations by @mmathew23 in #3609
Enable FP8 + RL training for bf16 models by @andrewor14 in #3440
Fix/save torchao model loading logic by @rolandtannous in #3621
Fix LlamaModel_fast_forward signature to match HF Transformers (Support inputs_embeds) by @MercuryYen in #3623
Add 128x128 PerBlock FP8 + RL by @andrewor14 in #3629
Add trust_remote_code parameter to tokenizer by @Etherll in #3631
[intel] change windows to remove windows-triton for intel xpu by @leizhenyuan in #3168

Unsloth Zoo Changes

Bug fixes by @danielhanchen in unslothai/unsloth-zoo#327
Fix GRPO by @danielhanchen in unslothai/unsloth-zoo#328
fix gpt oss memory calculation for intel device by @leizhenyuan in unslothai/unsloth-zoo#330
Bug fixes by @danielhanchen in unslothai/unsloth-zoo#331
Bug fixes by @danielhanchen in unslothai/unsloth-zoo#332
fixed unbound local error tokenizer-model from cache by @rolandtannous in unslothai/unsloth-zoo#333
Now it works on a uv venv by @kittawere in unslothai/unsloth-zoo#336
Gemma3n fix by @mmathew23 in unslothai/unsloth-zoo#338
[Intel] remove triton windows for intel by @leizhenyuan in unslothai/unsloth-zoo#243
FP8 training enhancements by @Datta0 in unslothai/unsloth-zoo#337
GRPO gradient accumulation steps update and DAPO support by @pluesclues in unslothai/unsloth-zoo#308
Fix/video collate by @mmathew23 in unslothai/unsloth-zoo#342
Bug fixes by @danielhanchen in unslothai/unsloth-zoo#344
FP8, Standby and vLLM updates by @Datta0 in unslothai/unsloth-zoo#340
Put importance sampling into no grad by @pluesclues in unslothai/unsloth-zoo#343
Detach hidden states to avoid gradient carry by @pluesclues in unslothai/unsloth-zoo#345
Bug fixes by @danielhanchen in unslothai/unsloth-zoo#347
MoE: Cast routing_weights dtype correctly by @mmathew23 in unslothai/unsloth-zoo#349
return local model in determine_base_model_source with any quantization by @noah1510 in unslothai/unsloth-zoo#334
Enable FP8 + RL training by @andrewor14 in unslothai/unsloth-zoo#351
Tiled MLP Implementation by @mmathew23 in unslothai/unsloth-zoo#350
Fix gradient checkpointing layer caller kwargs by @mmathew23 in unslothai/unsloth-zoo#353
vLLM weight scale FP8 and standby override by @Datta0 in unslothai/unsloth-zoo#354
Fix docstring removing regex to support empty parentheses by @noisycat3 in unslothai/unsloth-zoo#360

Unsloth Notebooks Changes

Feat/qwen3 vl by @Erland366 in unslothai/notebooks#119
Feat/double footer fix by @Erland366 in unslothai/notebooks#121
Add GGUF section for Qwen3-VL by @Etherll in unslothai/notebooks#123
Fix TypeError in unsloth_push_to_hub_gguf() when pushing GGUF model to Hugging Face by @samanta-sc in unslothai/notebooks#125
fix TorchAOConfig' object has no attribute 'base_config' error by @rolandtannous in unslothai/notebooks#129
Updated Dockerfile for DGX Spark by @sameersegal in unslothai/notebooks#133
gemma3-270m: reduce batch size for sample packing by @djsaunde in unslothai/notebooks#135
fix dataset formatting and mapping for Magistral reasoning by @rolandtannous in unslothai/notebooks#136
fix magistral inference by @rolandtannous in unslothai/notebooks#138

Full Changelog: October-2025...November-2025

What's Changed

Grpo gradient accumulation edits by @pluesclues in #3390
Nightly by @danielhanchen in #3532
Handle TRL version compatibility in rl_replacements.py by @pluesclues in #3540
Bug fixes by @danielhanchen in #3546
Sleep trl patch by @Datta0 in #3517
Detach logits before returning from function by @pluesclues in #3554
Fix typos in comment by @mk0walsk in #3557
Formatting & bug fixes by @danielhanchen in #3563
DeepseekOCR: add trust_remote_code kwarg by @mmathew23 in #3564
pre-commit CI config by @djsaunde in #3565
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3576
Resize rope embeddings for long sequence training by @mmathew23 in #3586
Patch in tiled mlp by @mmathew23 in #3584
Support for out-of-source quantizers by @Giuseppe5 in #3534
Fix: prevent rope_embedding AssertionError by checking kv_seq_len before reuse by @jarrycyx in #3578
Extend TorchAOConfig to support mobile usecases by @metascroy in #3587
fix qwen3 vl gradient accumulation by @mmathew23 in #3598
Do not force set beta to 0 for DAPO by @Datta0 in #3604
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3606
Fix broken links and typo in README by @mk0walsk in #3611
remove pre-commit workflow (covered by pre-commit app) by @djsaunde in #3618
Add an int64 path for mlp kernels by @mmathew23 in #3614
Remove grpo requirement bs=num_generations by @mmathew23 in #3609
Enable FP8 + RL training for bf16 models by @andrewor14 in #3440
Fix/save torchao model loading logic by @rolandtannous in #3621
Fix LlamaModel_fast_forward signature to match HF Transformers (Support inputs_embeds) by @MercuryYen in #3623
Add 128x128 PerBlock FP8 + RL by @andrewor14 in #3629
Add trust_remote_code parameter to tokenizer by @Etherll in #3631
[intel] change windows to remove windows-triton for intel xpu by @leizhenyuan in #3168
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3634
Float8 GRPO, RL by @danielhanchen in #3640

New Contributors

@mk0walsk made their first contribution in #3557
@pre-commit-ci[bot] made their first contribution in #3576
@Giuseppe5 made their first contribution in #3534
@jarrycyx made their first contribution in #3578
@MercuryYen made their first contribution in #3623

Full Changelog: October-2025...November-2025

unslothai/unsloth November-2025 November Release + FP8 Training! on GitHub

Bug Fixes and Enhancements

What's Changed

Unsloth Zoo Changes

Unsloth Notebooks Changes

What's Changed

New Contributors

unslothai/unsloth November-2025
November Release + FP8 Training!

on GitHub