Our first release of 2026! This year we’ve got a lot of exciting things coming and to kick things off, we’re introducing faster MoE training, embedding model support, and ultra long context for Reinforcement Learning. We’ll also be launching our brand new UI very soon.
We’d like to thank all of you for 50K stars on GitHub! ⭐
We’ve also added support for many new models that you can now run and fine-tune locally, including DeepSeek-OCR 2, GLM-4.7-Flash, Kimi-2.5, and more.
🚀 Faster MoE training
You can now train MoE models 12× faster with 35% less VRAM and 6x longer context via our new Triton and math kernels (no accuracy loss). gpt-oss-20b works on 12.8GB VRAM. Qwen3-30B-A3B (16-bit LoRA) uses 63GB.
Unsloth supports fast training for gpt-oss, Qwen3 (30B, 235B, VL, Coder), DeepSeek R1/V3 arch and GLM (4.7, Flash) models.
🔎 Embedding models now train 2× faster
We collaborated with Hugging Face to enable 1.8-3.3x faster embedding, BERT and classifier model training with 20% less VRAM, 2x longer context & no accuracy loss vs. FA2 setups.
💡 Ultra Long Context RL is here
We’re introducing new batching algorithms to enable ~7x longer context (can be more than 12x) RL training with no accuracy or speed degradation vs. other optimized setups that use FA3, kernels & chunked losses.
Unsloth now trains gpt-oss QLoRA with 380K context on a single 192GB NVIDIA B200 GPU
🔮 New models
- 🐳 DeepSeek-OCR 2 - Run and fine-tune the new OCR model.
- 🥝 Kimi 2.5 - Run the SOTA model locally with Unsloth GGUFs.
- ⚡ GLM-4.7-Flash - Run and fine-tune the best-in-class 30B LLM.
🎉 Extra Updates
- As part of our MoE release, we also made Gemma-3 now use Flex-Attention by default, and this works in float16 settings as well (there were infinities which we solved a while back). Gemma-3 now uses O(N) memory and not O(N^2) memory, and trains >3x faster (scales even better with context length). Previous Unsloth versions would OOM.
- Vision fine-tuning now accepts mixed data of only images and text data!
trl==0.27.1andtransformers==5.1.0are supported well - previous coverage was 30% of all our 120 notebooks, but now we have >80% coverage - we plan to make it 100% over the next few days.- And many many other bug fixes and other updates!
📖 New Guides
- </> How To Use Claude Code + Codex with local LLMs: Guide
- 👾 Train & deploy to LM Studio for local inference: Guide
- 🎨 Run Diffusion image models with Unsloth GGUFs: Guide
Tip
Update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
If you want PyTorch 2.9: pip install --upgrade unsloth unsloth_zoo
February is shaping up to be an amazing month for LLM releases, and we hope you’re just as excited as we are. 😊
What's Changed
- [FIX] [Transformers] VLM input embeds fix for gradients by @Datta0 in #3715
- [fbgemm] Silence tma fbgemm by @Datta0 in #3735
- [hf_hub] Token login by @Datta0 in #3739
- Do not overwrite slots by @Datta0 in #3752
- Fix VLM + DDP checkpointing by @djsaunde in #3751
- Enable 4-bit quantization on AMD Radeon GPUs by @sstamenk in #3748
- Nightly by @danielhanchen in #3753
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3760
- Nightly by @danielhanchen in #3767
- Add missing import of inspect by @sstamenk in #3778
- Clarify NotImplementedError for fast_inference with full_finetuning by @Fizza-Mukhtar in #3768
- Update FUNDING.yml by @danielhanchen in #3792
- fix(trainer): import psutil to prevent NameError in _prepare_dataset by @alkinun in #3780
- fastrope fix for zero strided tensors by @f14-bertolotti in #3782
- Fix crash when trl.experimental.openenv is unavailable by @Fizza-Mukhtar in #3787
- Fix Boolean value of Tensor ambiguity error in mistral.py by @yurekami in #3790
- fix: add support for init_lora_weights="corda" in get_peft_model by @majiayu000 in #3794
- Fix correctness bugs in rl.py, rl_replacements.py, and vision.py by @danielhanchen in #3811
- Fix correctness bugs across multiple model files by @danielhanchen in #3813
- Fix 3D tensor support for bitsandbytes 8-bit matmul in forward pass by @Fizza-Mukhtar in #3806
- FIX: weight tying for LoRA embeddings and lm_head by @oKatanaaa in #3711
- Fix Gemma3 QAT training instability with int8-int4 scheme by @danielhanchen in #3818
- Add helpful error messages for fast_generate when fast_inference=False by @danielhanchen in #3820
- Bug fixes by @danielhanchen in #3821
- Make llama.cpp CURL dependency optional when building from source by @Fizza-Mukhtar in #3822
- remove redundant code of has_block by @ykaitao in #3832
- rl.py fixes: buffer reset, safer attribute access, typo fix by @danielhanchen in #3834
- Respect user quantization_config by @danielhanchen in #3835
- Fix vLLM PDL bug on Blackwell GPUs (B200/B100) by @danielhanchen in #3841
- Sync chat_template from tokenizer to vLLM by @danielhanchen in #3842
- remove unused variable BlockDiagonalCausalMask by @ykaitao in #3836
- Replace GitHub API check with vLLM version check for PDL fix by @danielhanchen in #3849
- GRPO: restore model mode after generate (stacked on #3754) by @danielhanchen in #3851
- Fix model training state restoration in GRPO trainer by @numb3r33 in #3754
- Unify Version usage and fix TRL version handling by @danielhanchen in #3843
- [ModelScope] Disable stats when modelscope is being used by @Datta0 in #3857
- Fix FBGEMM/CUTLASS errors on SM100 (Blackwell) GPUs by @danielhanchen in #3863
- Feature/raw text dataprep by @Vangmay in #3612
- Fix Kaggle telemetry misclassification when COLAB_ keys exist by @hnxnq7 in #3869
- reduce code duplication by _offload_frozen_module_for_training by @ykaitao in #3865
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3881
- wrong number of dimensions by @f14-bertolotti in #3880
- Disable gradient checkpointing when explicitly off for vision by @ducviet00 in #3879
- [trl] use non lora model as base for RL by @Datta0 in #3895
- Chunk Across Batch and Context length for logprob calculations for grpo by @pluesclues in #3628
- add weight-only int8 QAT scheme and update tests for torchao 0.15.0 by @electroglyph in #3859
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3905
- Fix vllm ipykernel patch by @pluesclues in #3907
- Handle Transformers 5 vLLM import errors by @danielhanchen in #3908
- add FastSentenceTransformer for easily finetuning SentenceTransformer models by @electroglyph in #3719
- Guard torch.compile on ROCm when triton_key is missing by @hnxnq7 in #3923
- Grpo compile settings update by @pluesclues in #3927
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #3937
- chore: Update outdated GitHub Actions version by @pgoslatara in #3936
- [trl] vllm trl topk fixup by @Datta0 in #3935
- [fix] qwen3-guard tokenizer by @Datta0 in #3959
- fix for intel devices torch compile configs by @leizhenyuan in #3952
- Use standard gradient checkpointing for small sequence lengths by @danielhanchen in #3867
- reduce code duplication by @ykaitao in #3877
- Fix TRL 0.27.0 GRPO compatibility and PEFT model handling by @danielhanchen in #3969
- Fix Vision GRPO string prompts and OpenEnv async compatibility by @danielhanchen in #3964
- Fix num_train_epochs=None causing TypeError in GRPOConfig by @danielhanchen in #3972
- Add TRL truncation regression and metadata loss fixes (Fixes 1 and 3) by @danielhanchen in #3971
- Add vLLM + torch < 2.9.0 + SM100 compatibility check by @danielhanchen in #3973
- Fix torchvision compatibility check for source builds and future torch versions by @danielhanchen in #3978
- Trl 0.27.0 update by @pluesclues in #3965
- Prefer flex attention when available by @danielhanchen in #3979
- Fix GPT-OSS BlockMask error during inference by @danielhanchen in #3982
- Silence third-party deprecation warnings and fix socket leak by @danielhanchen in #3983
- Silence non-actionable TRL trainer import failures by @danielhanchen in #3980
- Add PyTorch 2.10 and xformers 0.0.34 support by @danielhanchen in #3985
- [MoE] Improve moe kernels for unsloth fine tuning by @Datta0 in #3812
- Fix RuntimeError not caught when torchcodec fails to load by @danielhanchen in #3987
- Fix cutlass inductor options for PyTorch < 2.8.0 by @danielhanchen in #3988
- Disable torchcodec in transformers when FFmpeg is missing by @danielhanchen in #3989
- Update rl_replacements.py to filter through correct trl version by @pluesclues in #3990
- Fix multiprocessing crash on Windows/macOS and unify num_proc logic by @danielhanchen in #3999
- Fix triton 3.6.0 + torch 2.9.x torch.compile crash (missing cluster_dims) by @danielhanchen in #4001
- Add push_to_hub_gguf support for FastSentenceTransformer by @Etherll in #4002
- [Feature] seperate gguf file path by @RektPunk in #3934
- Refactor Ollama template wiring and harden packing helpers by @mmangkad in #3890
- Fix multi-GPU loading for quantized models in distributed training by @Fizza-Mukhtar in #3917
- Fix broken documentation links, typos, and formatting in README by @danielhanchen in #4003
- fix: inputs_embeds ignored when input_ids is not None in _fast_prepare_inputs_for_generation by @siddhudonda in #3814
- Fix notebook compatibility for transformers 4.57.6 and TRL 0.22-0.27 by @danielhanchen in #3998
- Fix VLM model + text-only dataset ValueError in TRL 0.22.x by @danielhanchen in #4004
- Fix trl.experimental thin wrapper compilation and OOM from peft_config overwrite by @danielhanchen in #4006
- Fix dtype mismatch in fp16 + 4-bit/8-bit LoRA training by @danielhanchen in #4005
- Silence TRL's batch_size=1 padding-free warning in compiled trainer source by @danielhanchen in #4007
- Silence peft target_parameters RuntimeWarning for MoE models by @danielhanchen in #4008
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #4009
- Suppress vLLM v1 executor sleep/wake log messages by @danielhanchen in #4011
- Inject model reference for dynamic token_type_ids detection in SFTTrainer by @danielhanchen in #4012
- Fix EmbeddingGemma float16 NaN via FORCE_FLOAT32 for gemma3_text by @danielhanchen in #4014
- Fix #3397: Prevent trainer tokenization hang with safe num_proc by @Fizza-Mukhtar in #4013
- add llama.cpp prefix to gguf conversion help messages by @rolandtannous in #4016
- [Misc] Fixes by @Datta0 in #4015
- FP8: Load model on-the-fly in vLLM by @andrewor14 in #3717
- Fix Gemma3 4B training on transformers 5.x (token_type_ids) by @danielhanchen in #4017
- Fix warmup_ratio deprecation for transformers >= 5.0 by @danielhanchen in #4019
- Misc fixes by @Datta0 in #4018
Unsloth Zoo Changes
- Fix training crash when using DoRA + 4-bit quantization by @Etherll in unslothai/unsloth-zoo#394
- fix for #392, transformers 5 by @electroglyph in unslothai/unsloth-zoo#393
- fix: adds missing import for torch.distributed by @namekian-mystifier in unslothai/unsloth-zoo#422
- Fix dtype mismatch in full finetuning + float16 inference by @danielhanchen in unslothai/unsloth-zoo#424
- Fix undefined variable 'e' in Version() function by @danielhanchen in unslothai/unsloth-zoo#425
- Fix correctness bugs in logging_utils.py and loss_utils.py by @danielhanchen in unslothai/unsloth-zoo#426
- Fix execute_with_time_limit start_method bug by @danielhanchen in unslothai/unsloth-zoo#428
- Fix OpenEnv PYTHONPATH auto-detection for compatibility by @danielhanchen in unslothai/unsloth-zoo#429
- Fix VARIANT_KWARG_KEYS import for peft >= 0.18.0 by @danielhanchen in unslothai/unsloth-zoo#430
- Fix ZeroDivisionError in fused cross entropy when GPU memory exhausted by @GabrielArpini in unslothai/unsloth-zoo#432
- Only enable gradient checkpointing when requested by @danielhanchen in unslothai/unsloth-zoo#433
- Removing import check in compiler.py by @Vidit-Ostwal in unslothai/unsloth-zoo#431
Unsloth Notebooks changes
- Add Gemma phone deployment notebook by @glee2429 in unslothai/notebooks#146
- Use stable executorch 1.0.0 and optimum-executorch v0.1.0 by @danielhanchen in unslothai/notebooks#151
- Update 2048 RL notebook with training results by @danielhanchen in unslothai/notebooks#152
- Update 2048 RL notebook with extended training results by @danielhanchen in unslothai/notebooks#153
- new GRPO update notebooks by @pluesclues in unslothai/notebooks#155
- gemma3 1b changes by @pluesclues in unslothai/notebooks#156
- nemo gym multi environment notebook by @cmunley1 in unslothai/notebooks#158
- Add LFM2.5 notebooks by @mlabonne in unslothai/notebooks#159
- Revert "Add LFM2.5 notebooks" by @danielhanchen in unslothai/notebooks#161
- Restore UNSLOTH_VLLM_STANDBY in Kaggle Gemma3 Vision GRPO by @danielhanchen in unslothai/notebooks#163
- Grpo update gemma notebooks correctly and news lines for notebooks by @pluesclues in unslothai/notebooks#157
- Add LFM2.5 notebooks (reopen #159) by @danielhanchen in unslothai/notebooks#164
- GLM 4.7 Flash finetuning notebook by @Datta0 in unslothai/notebooks#166
- Embedding models notebooks by @Etherll in unslothai/notebooks#160
- add Qwen3_Embedding_0.6B notebook by @Etherll in unslothai/notebooks#167
- [UPDATE] Update openenv notebooks to use the latest implementation by @burtenshaw in unslothai/notebooks#165
- Fix Vision GRPO chat template and Orpheus column removal by @danielhanchen in unslothai/notebooks#171
- update nemo gym notebooks by @cmunley1 in unslothai/notebooks#169
- Fix Vision GRPO notebooks and Orpheus TTS compatibility by @danielhanchen in unslothai/notebooks#172
- Add AMD known issues note by @hnxnq7 in unslothai/notebooks#168
- Update Dockerfile_DGX_Spark by @XEL-Maker in unslothai/notebooks#162
- Revert PR #165 - OpenEnv notebooks by @danielhanchen in unslothai/notebooks#179
- Fix update_all_notebooks.py script improvements by @danielhanchen in unslothai/notebooks#176
- Makign qwen 2.5 7b compatible with old trl versions. by @pluesclues in unslothai/notebooks#177
- Fix Ministral VL installation cells by @danielhanchen in unslothai/notebooks#181
- Improve update_all_notebooks.py: format preservation, cross-platform fixes, parallelization by @danielhanchen in unslothai/notebooks#183
- Refactor update_all_notebooks.py: reorder sections, CRLF handling, README categories by @danielhanchen in unslothai/notebooks#184
- Separate OCR into its own README section by @danielhanchen in unslothai/notebooks#185
- [MoE] notebooks for Colab by @Datta0 in unslothai/notebooks#187
New Contributors
- @sstamenk made their first contribution in #3748
- @Fizza-Mukhtar made their first contribution in #3768
- @alkinun made their first contribution in #3780
- @f14-bertolotti made their first contribution in #3782
- @yurekami made their first contribution in #3790
- @majiayu000 made their first contribution in #3794
- @ykaitao made their first contribution in #3832
- @numb3r33 made their first contribution in #3754
- @Vangmay made their first contribution in #3612
- @hnxnq7 made their first contribution in #3869
- @ducviet00 made their first contribution in #3879
- @electroglyph made their first contribution in #3859
- @pgoslatara made their first contribution in #3936
- @RektPunk made their first contribution in #3934
- @mmangkad made their first contribution in #3890
- @siddhudonda made their first contribution in #3814
Full Changelog: December-2025...February-2026