✨ Gemma 3n now available
- Google's new Gemma 3n multimodal models in 2B (E2B) and 4B (E4B) sizes
- Supports audio, vision, video and text inputs
- Available in safetensors, GGUF and dynamic 4bit BnB for finetuning.
- HuggingFace Collection Link: Gemma-3n
🎵 Text-to-Speech (TTS) Fine-tuning
- Train TTS/STT models like Sesame-CSM, Orpheus-TTS and OpenAI's Whisper locally!
- Clone voices, learn new emotions, tones & styles with 1.5x faster training and -50% VRAM
- TTS notebooks: https://docs.unsloth.ai/get-started/unsloth-notebooks#text-to-speech-tts-notebooks
Tip
Update Unsloth via pip install --upgrade --force-reinstall unsloth unsloth_zoo
🧠 DeepSeek-R1-0528 Support with Dynamic 1-bit GGUFs
- Fine-tune DeepSeek-R1-0528-Qwen3 with GRPO! Our new reward function increases multilingual response rates by 40%+
- Dynamic 1-bit GGUFs shrink the full 715GB model to just 185GB (-75% size) with optimal accuracy
- DeepSeek-R1-0528-Qwen3 notebook: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/DeepSeek_R1_0528_Qwen3_(8B)_GRPO.ipynb
📈 Dynamic 2.0 GGUFs
- New quantization method outperforms leading quantization methods
- Sets new benchmarks for 5-shot MMLU and KL Divergence
- Selectively quantizes layers for optimal accuracy
- For more information: https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs
⚡ Advanced Qwen3 GRPO notebook
- Proximity scoring for more nuanced reward functions
- OpenR1 dataset support with advanced templates
- Prefinetuning to skip GRPO format learning
- https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb
# DeepSeek-R1 GRPO Fine-tuning Example: convert DeepSeek-R1-0528-Qwen3-8B into a reasoning model via GRPO by using OpenR1's Math dataset.
from unsloth import FastLanguageModel
import torch
max_seq_length = 1024 # Can increase for longer reasoning traces
lora_rank = 32 # Larger rank = smarter, but slower
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/DeepSeek-R1-0528-Qwen3-8B",
max_seq_length = max_seq_length,
load_in_4bit = True, # False for LoRA 16bit
fast_inference = True, # Enable vLLM fast inference
max_lora_rank = lora_rank,
gpu_memory_utilization = 0.7, # Reduce if out of memory
)
model = FastLanguageModel.get_peft_model(
model,
r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = [
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
],
lora_alpha = lora_rank*2, # *2 speeds up training
use_gradient_checkpointing = "unsloth", # Reduces memory usage
random_state = 3407,
)
reasoning_start = None
reasoning_end = None
user_token = None
assistant_token = None
for token in tokenizer.get_added_vocab().keys():
if "think" in token and "/" in token:
reasoning_end = token
elif "think" in token:
reasoning_start = token
elif "user" in token:
user_token = token
elif "assistant" in token:
assistant_token = token
system_prompt = \
f"""You are given a problem.
Think about the problem and provide your working out.
You must think in Bahasa Indonesia."""
print(tokenizer.apply_chat_template([
{"role" : "user", "content" : "What is 1+1?"},
{"role" : "assistant", "content" : f"<think>I think it's 2.2</think>2"},
{"role" : "user", "content" : "What is 1+1?"},
{"role" : "assistant", "content" : f"<think>I think it's 2.2</think>2"},
], tokenize = False, add_generation_prompt = True))
from datasets import load_dataset
dataset = load_dataset("open-r1/DAPO-Math-17k-Processed", "en", split = "train")
def extract_hash_answer(text):
# if "####" not in text: return None
# return text.split("####")[1].strip()
return text
dataset = dataset.map(lambda x: {
"prompt" : [
{"role": "system", "content": system_prompt},
{"role": "user", "content": x["prompt"]},
],
"answer": extract_hash_answer(x["solution"]),
})
# Add optional EOS token matching
solution_end_regex = rf"{reasoning_end}(.*)"
match_format = re.compile(solution_end_regex, re.DOTALL)
match_format
"""We verify it works:"""
match_format.findall(
"Let me think!</think>"\
f"Hence, the solution is 2.",
)
match_format.findall(
"<think>Let me think!</think>"\
f"\n\nHence, the solution is 2",
)
def match_format_exactly(completions, **kwargs):
scores = []
for completion in completions:
score = 0
response = completion[0]["content"]
# Match if format is seen exactly!
if match_format.search(response) is not None: score += 3.0
scores.append(score)
return scores
"""If it fails, we want to reward the model if it at least follows the format partially, by counting each symbol:"""
def match_format_approximately(completions, **kwargs):
scores = []
for completion in completions:
score = 0
response = completion[0]["content"]
# Count how many keywords are seen - we penalize if too many!
# If we see 1, then plus some points!
# No need to reward <think> since we always prepend it!
score += 0.5 if response.count(reasoning_start) == 1 else -1.0
score += 0.5 if response.count(reasoning_end) == 1 else -1.0
scores.append(score)
return scores
"""We want to extract the generated answer, and reward or penalize it! We also reward it based on how close the answer is to the true one via ratios:"""
def check_answer(prompts, completions, answer, **kwargs):
question = prompts[0][-1]["content"]
responses = [completion[0]["content"] for completion in completions]
extracted_responses = [
guess.group(1)
if (guess := match_format.search(r)) is not None else None \
for r in responses
]
scores = []
for guess, true_answer in zip(extracted_responses, answer):
score = 0
if guess is None:
scores.append(-2.0)
continue
# Correct answer gets 5 points!
if guess == true_answer:
score += 5.0
# Match if spaces are seen, but less reward
elif guess.strip() == true_answer.strip():
score += 3.5
else:
# We also reward it if the answer is close via ratios!
# Ie if the answer is within some range, reward it!
try:
ratio = float(guess) / float(true_answer)
if ratio >= 0.9 and ratio <= 1.1: score += 2.0
elif ratio >= 0.8 and ratio <= 1.2: score += 1.5
else: score -= 2.5 # Penalize wrong answers
except:
score -= 4.5 # Penalize
scores.append(score)
return scores
match_numbers = re.compile(
r".*?[\s]{0,}([-]?[\d\.\,]{1,})",
flags = re.MULTILINE | re.DOTALL
)
print(match_numbers.findall(" 0.34 "))
print(match_numbers.findall(" 123,456 "))
print(match_numbers.findall(" -0.234 "))
print(match_numbers.findall("17"))
import langid
def get_lang(text: str) -> str:
if not text:
return "und"
lang, _ = langid.classify(text)
return lang
print(get_lang("Hello, How are you")) # This should return en
print(get_lang("Aku berpikir kalau aku adalah kamu")) # This should return id
print(get_lang("我在这里")) # This should return zh
import re
def format_and_language_reward_func(completions, **kwargs):
scores = []
for completion_item in completions:
if not completion_item or not isinstance(completion_item[0], dict) or "content" not in completion_item[0]:
scores.append(-5.0)
print(f"Warning: Malformed completion item, assigning default low score: {completion_item}")
continue
content = completion_item[0]["content"]
lang = get_lang(content)
if lang == 'id':
score = 5.0
elif lang == 'en':
score = -3.0
elif lang == 'zh':
score = -3.0
else:
score = -5.0
scores.append(score)
return scores
prompts = [
[{"role": "assistant", "content": "What is the result of (1 + 2) * 4?"}],
[{"role": "assistant", "content": "What is the result of (3 + 1) * 2?"}],
]
completions = [
[{"role": "assistant", "content": "<think>The sum of 1 and 2 is 3, which we multiply by 4 to get 12.</think><answer>(1 + 2) * 4 = 12</answer>"}],
[{"role": "assistant", "content": "The sum of 3 and 1 is 4, which we multiply by 2 to get 8. So (3 + 1) * 2 = 8."}],
]
format_and_language_reward_func(prompts=prompts, completions=completions)
global PRINTED_TIMES
PRINTED_TIMES = 0
global PRINT_EVERY_STEPS
PRINT_EVERY_STEPS = 5
def check_numbers(prompts, completions, answer, **kwargs):
question = prompts[0][-1]["content"]
responses = [completion[0]["content"] for completion in completions]
extracted_responses = [
guess.group(1)
if (guess := match_numbers.search(r)) is not None else None \
for r in responses
]
scores = []
# Print only every few steps
global PRINTED_TIMES
global PRINT_EVERY_STEPS
if PRINTED_TIMES % PRINT_EVERY_STEPS == 0:
print(
'*'*20 + f"Question:\n{question}", f"\nAnswer:\n{answer[0]}", f"\nResponse:\n{responses[0]}", f"\nExtracted:\n{extracted_responses[0]}"
)
PRINTED_TIMES += 1
for guess, true_answer in zip(extracted_responses, answer):
if guess is None:
scores.append(-2.5)
continue
# Convert to numbers
try:
true_answer = float(true_answer.strip())
# Remove commas like in 123,456
guess = float(guess.strip().replace(",", ""))
scores.append(3.5 if guess == true_answer else -1.5)
except:
scores.append(0)
continue
return scores
tokenized = dataset.map(
lambda x: {"tokens" : tokenizer.apply_chat_template(x["prompt"], add_generation_prompt = True, tokenize = True)},
batched = True,
)
print(tokenizer.decode(tokenized[0]["tokens"]))
tokenized = tokenized.map(lambda x: {"L" : len(x["tokens"])})
import numpy as np
maximum_length = int(np.quantile(tokenized["L"], 0.9))
print("Max Length = ", maximum_length)
# Filter only samples smaller than 90% max length
dataset = dataset.select(np.where(np.array(tokenized["L"]) <= maximum_length)[0])
del tokenized
max_prompt_length = maximum_length + 1 # + 1 just in case!
max_completion_length = max_seq_length - max_prompt_length
from vllm import SamplingParams
vllm_sampling_params = SamplingParams(
min_p = 0.1,
top_p = 1.0,
top_k = -1,
seed = 3407,
stop = [tokenizer.eos_token],
include_stop_str_in_output = True,
)
from trl import GRPOConfig, GRPOTrainer
training_args = GRPOConfig(
vllm_sampling_params = vllm_sampling_params,
temperature = 1.0,
learning_rate = 5e-6,
weight_decay = 0.01,
warmup_ratio = 0.1,
lr_scheduler_type = "linear",
optim = "adamw_8bit",
logging_steps = 1,
per_device_train_batch_size = 1,
gradient_accumulation_steps = 1, # Increase to 4 for smoother training
num_generations = 4, # Decrease if out of memory
max_prompt_length = max_prompt_length,
max_completion_length = max_completion_length,
# num_train_epochs = 1, # Set to 1 for a full training run
max_steps = 100,
save_steps = 100,
report_to = "none", # Can use Weights & Biases
output_dir = "outputs",
# For optional training + evaluation
# fp16_full_eval = True,
# per_device_eval_batch_size = 4,
# eval_accumulation_steps = 1,
# eval_strategy = "steps",
# eval_steps = 1,
)
trainer = GRPOTrainer(
model = model,
processing_class = tokenizer,
reward_funcs = [
match_format_exactly,
match_format_approximately,
check_answer,
check_numbers,
format_and_language_reward_func,
],
args = training_args,
train_dataset = dataset,
# For optional training + evaluation
# train_dataset = new_dataset["train"],
# eval_dataset = new_dataset["test"],
)
trainer.train()
🎯 Magistral Conversational Reasoning
- Fine-tune Magistral-24B for advanced conversational reasoning
- Magistral notebook: https://github.com/unslothai/notebooks/blob/main/nb/Magistral_(24B)-Reasoning-Conversational.ipynb
👁️ Gemma3 Vision Support
- Fine-tune Gemma3 vision models for multimodal tasks
- Gemma3 Vision notebook: https://github.com/unslothai/notebooks/blob/main/nb/Gemma3_(4B)-Vision.ipynb
Documentation & Guides
- Reinforcement Learning Guide: Complete guide on RL for LLMs covering GRPO, RLHF, DPO. Check it out here: https://docs.unsloth.ai/basics/reinforcement-learning-guide
- LoRA Hyperparameters Guide: Master optimal learning rates, epochs, LoRA rank & alpha settings, Check it out here: https://docs.unsloth.ai/get-started/fine-tuning-guide/lora-hyperparameters-guide
What's Changed
- Nightly by @danielhanchen in #2448
- Added k_norm & q_norm to merged Qwen3 layers by @cblomert in #2452
- MoE Kernel by @jeromeku in #2465
- Blackwell Support by @johnnynunez in #2458
- Added missing code of conduct by @rolandtannous in #2416
- Fix readme example by @yuanzhedong in #2492
- the pixtral vision notebook fails during inference by @mmathew23 in #2466
- [1/N] Enable intel GPU for unsloth by @leizhenyuan in #2350
- [2/N] Enable intel GPU for unsloth by @leizhenyuan in #2388
- vLLM Windows CUDA support [tested] by @fenglui in #2158
- Add Sesame CSM by @mmathew23 in #2527
- Add Qwen-3 chat template and Ollama template support by @kiankyars in #2537
- Fix typos by @omahs in #2540
- Add use_rslora reference to LoraConfig inititalisation by @jkumz in #2539
- TTS by @danielhanchen in #2545
- Quick fix on the CompileConfig error by @Erland366 in #2554
- Fix trust remote code by @Etherll in #2357
- fix issue with qwen3 template double quote escapes by @davedgd in #2563
- Display the model name in RoPE scaling unsupported error by @emmanuel-ferdman in #2564
- Fix Whisper, ModernBERT by @danielhanchen in #2565
- fix: improved error handling when llama.cpp build fails #2358 by @Hansehart in #2603
- Remove
dataset_text_field
fromSFTConfig
by @qgallouedec in #2609 - Upgrade trl fix by @Datta0 in #2544
- Check the
skip_prepare_dataset
before accessing dataset fields. #2496 by @Premik in #2633 - Llama4 MoE Grouped GEMM by @jeromeku in #2639
- Latest TRL, GRPO + Bug fixes by @danielhanchen in #2645
- Fix SFTtraining for new trl by @mmathew23 in #2647
- Bug fixes by @danielhanchen in #2651
- Fix quant model param fetch regex by @Datta0 in #2662
- Fix batched generation for prompts of different lengths by @RunFMe in #2216
- reroute merge logic language models + comprehensive tests + eval kits by @rolandtannous in #2673
- unsloth checkpointing fix for latest transformers==4.52.x by @mmathew23 in #2674
- patch sft_trainer to favor max_seq_length over max_length in config by @mmathew23 in #2669
- Update prepare 4d causal attention call by @mmathew23 in #2678
- Ignore None Values when building vllm subprocess_command by @Salpingopharyngeus in #2680
- add support for torch270 with Intel GPU by @leizhenyuan in #2709
- Making protobuf version more flexible by @user799595 in #2637
- tests for additional merge fix unsloth zoo pr 163 by @rolandtannous in #2719
- Reward modeling update (There seems to be another patch) by @pluesclues in #2710
- Fix Typos in Documentation and Comments by @leopardracer in #2721
- Fix renaming on other model than Llama by @Erland366 in #2762
- Enable vLLM to share memory space by @Datta0 in #2712
- Fix TRL 1.8.2 by @marcandrelarochelle in #2774
- Fix AttributeError in GRPO trainer for models without llm attribute by @rolandtannous in #2780
- Additional tests for unsloth-zoo PR#174 by @rolandtannous in #2779
- Update pyproject.toml by @amrothemich in #2778
- Fix for grpo_compute_loss_slow by @simpissa in #2702
- Fix GRPO by @danielhanchen in #2787
- Docs: Fix typo and improve MoE docstrings by @kilavvy in #2784
- [5/N] Enable intel GPU for unsloth by @leizhenyuan in #2768
- Sequence Classification Bug Fixes by @pluesclues in #2793
- intel 5/N fix patch by @mmathew23 in #2792
- [3/N] Enable intel GPU for unsloth by @leizhenyuan in #2620
- [4/N] Enable intel GPU for unsloth by @mmathew23 in #2801
- [intel] use DeviceProperties instead of torch.xxx.deviceproperties by @leizhenyuan in #2803
- Fix grpo sleep regex and indentation by @Datta0 in #2804
- Bug fixes by @danielhanchen in #2805
- Bug fixes by @danielhanchen in #2807
New Contributors
- @cblomert made their first contribution in #2452
- @johnnynunez made their first contribution in #2458
- @rolandtannous made their first contribution in #2416
- @yuanzhedong made their first contribution in #2492
- @mmathew23 made their first contribution in #2466
- @leizhenyuan made their first contribution in #2350
- @fenglui made their first contribution in #2158
- @kiankyars made their first contribution in #2537
- @omahs made their first contribution in #2540
- @jkumz made their first contribution in #2539
- @davedgd made their first contribution in #2563
- @emmanuel-ferdman made their first contribution in #2564
- @qgallouedec made their first contribution in #2609
- @Premik made their first contribution in #2633
- @RunFMe made their first contribution in #2216
- @Salpingopharyngeus made their first contribution in #2680
- @user799595 made their first contribution in #2637
- @pluesclues made their first contribution in #2710
- @leopardracer made their first contribution in #2721
- @marcandrelarochelle made their first contribution in #2774
- @amrothemich made their first contribution in #2778
- @simpissa made their first contribution in #2702
- @kilavvy made their first contribution in #2784
Full Changelog: May-2025...June-2025