unslothai/unsloth June-2025 on GitHub

✨ Gemma 3n now available

Google's new Gemma 3n multimodal models in 2B (E2B) and 4B (E4B) sizes
Supports audio, vision, video and text inputs
Available in safetensors, GGUF and dynamic 4bit BnB for finetuning.
HuggingFace Collection Link: Gemma-3n

🎵 Text-to-Speech (TTS) Fine-tuning

Train TTS/STT models like Sesame-CSM, Orpheus-TTS and OpenAI's Whisper locally!
Clone voices, learn new emotions, tones & styles with 1.5x faster training and -50% VRAM
TTS notebooks: https://docs.unsloth.ai/get-started/unsloth-notebooks#text-to-speech-tts-notebooks

Tip

Update Unsloth via pip install --upgrade --force-reinstall unsloth unsloth_zoo

🧠 DeepSeek-R1-0528 Support with Dynamic 1-bit GGUFs

Fine-tune DeepSeek-R1-0528-Qwen3 with GRPO! Our new reward function increases multilingual response rates by 40%+
Dynamic 1-bit GGUFs shrink the full 715GB model to just 185GB (-75% size) with optimal accuracy
DeepSeek-R1-0528-Qwen3 notebook: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/DeepSeek_R1_0528_Qwen3_(8B)_GRPO.ipynb

📈 Dynamic 2.0 GGUFs

New quantization method outperforms leading quantization methods
Sets new benchmarks for 5-shot MMLU and KL Divergence
Selectively quantizes layers for optimal accuracy
For more information: https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs

⚡ Advanced Qwen3 GRPO notebook

Proximity scoring for more nuanced reward functions
OpenR1 dataset support with advanced templates
Prefinetuning to skip GRPO format learning
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb

# DeepSeek-R1 GRPO Fine-tuning Example: convert DeepSeek-R1-0528-Qwen3-8B into a reasoning model via GRPO by using OpenR1's Math dataset.

from unsloth import FastLanguageModel
import torch
max_seq_length = 1024 # Can increase for longer reasoning traces
lora_rank = 32 # Larger rank = smarter, but slower

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-0528-Qwen3-8B",
    max_seq_length = max_seq_length,
    load_in_4bit = True, # False for LoRA 16bit
    fast_inference = True, # Enable vLLM fast inference
    max_lora_rank = lora_rank,
    gpu_memory_utilization = 0.7, # Reduce if out of memory
)

model = FastLanguageModel.get_peft_model(
    model,
    r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha = lora_rank*2, # *2 speeds up training
    use_gradient_checkpointing = "unsloth", # Reduces memory usage
    random_state = 3407,
)

reasoning_start = None
reasoning_end = None
user_token = None
assistant_token = None

for token in tokenizer.get_added_vocab().keys():
    if "think" in token and "/" in token:
        reasoning_end = token
    elif "think" in token:
        reasoning_start = token
    elif "user" in token:
        user_token = token
    elif "assistant" in token:
        assistant_token = token

system_prompt = \
f"""You are given a problem.
Think about the problem and provide your working out.
You must think in Bahasa Indonesia."""

print(tokenizer.apply_chat_template([
    {"role" : "user", "content" : "What is 1+1?"},
    {"role" : "assistant", "content" : f"<think>I think it's 2.2</think>2"},
    {"role" : "user", "content" : "What is 1+1?"},
    {"role" : "assistant", "content" : f"<think>I think it's 2.2</think>2"},
], tokenize = False, add_generation_prompt = True))

from datasets import load_dataset
dataset = load_dataset("open-r1/DAPO-Math-17k-Processed", "en", split = "train")

def extract_hash_answer(text):
    # if "####" not in text: return None
    # return text.split("####")[1].strip()
    return text

dataset = dataset.map(lambda x: {
    "prompt" : [
        {"role": "system", "content": system_prompt},
        {"role": "user",   "content": x["prompt"]},
    ],
    "answer": extract_hash_answer(x["solution"]),
})

# Add optional EOS token matching
solution_end_regex = rf"{reasoning_end}(.*)"

match_format = re.compile(solution_end_regex, re.DOTALL)
match_format

"""We verify it works:"""

match_format.findall(
    "Let me think!</think>"\
    f"Hence, the solution is 2.",
)

match_format.findall(
    "<think>Let me think!</think>"\
    f"\n\nHence, the solution is 2",
)

def match_format_exactly(completions, **kwargs):
    scores = []
    for completion in completions:
        score = 0
        response = completion[0]["content"]
        # Match if format is seen exactly!
        if match_format.search(response) is not None: score += 3.0
        scores.append(score)
    return scores

"""If it fails, we want to reward the model if it at least follows the format partially, by counting each symbol:"""

def match_format_approximately(completions, **kwargs):
    scores = []
    for completion in completions:
        score = 0
        response = completion[0]["content"]
        # Count how many keywords are seen - we penalize if too many!
        # If we see 1, then plus some points!

        # No need to reward <think> since we always prepend it!
        score += 0.5 if response.count(reasoning_start) == 1 else -1.0
        score += 0.5 if response.count(reasoning_end)   == 1 else -1.0
        scores.append(score)
    return scores

"""We want to extract the generated answer, and reward or penalize it! We also reward it based on how close the answer is to the true one via ratios:"""

def check_answer(prompts, completions, answer, **kwargs):
    question = prompts[0][-1]["content"]
    responses = [completion[0]["content"] for completion in completions]

    extracted_responses = [
        guess.group(1)
        if (guess := match_format.search(r)) is not None else None \
        for r in responses
    ]

    scores = []
    for guess, true_answer in zip(extracted_responses, answer):
        score = 0
        if guess is None:
            scores.append(-2.0)
            continue
        # Correct answer gets 5 points!
        if guess == true_answer:
            score += 5.0
        # Match if spaces are seen, but less reward
        elif guess.strip() == true_answer.strip():
            score += 3.5
        else:
            # We also reward it if the answer is close via ratios!
            # Ie if the answer is within some range, reward it!
            try:
                ratio = float(guess) / float(true_answer)
                if   ratio >= 0.9 and ratio <= 1.1: score += 2.0
                elif ratio >= 0.8 and ratio <= 1.2: score += 1.5
                else: score -= 2.5 # Penalize wrong answers
            except:
                score -= 4.5 # Penalize
        scores.append(score)
    return scores

match_numbers = re.compile(
    r".*?[\s]{0,}([-]?[\d\.\,]{1,})",
    flags = re.MULTILINE | re.DOTALL
)
print(match_numbers.findall("  0.34  "))
print(match_numbers.findall("  123,456  "))
print(match_numbers.findall("  -0.234  "))
print(match_numbers.findall("17"))

import langid

def get_lang(text: str) -> str:
    if not text:
        return "und"
    lang, _ = langid.classify(text)
    return lang


print(get_lang("Hello, How are you")) # This should return en
print(get_lang("Aku berpikir kalau aku adalah kamu")) # This should return id
print(get_lang("我在这里")) # This should return zh


import re

def format_and_language_reward_func(completions, **kwargs):
    scores = []

    for completion_item in completions:
        if not completion_item or not isinstance(completion_item[0], dict) or "content" not in completion_item[0]:
            scores.append(-5.0)
            print(f"Warning: Malformed completion item, assigning default low score: {completion_item}")
            continue

        content = completion_item[0]["content"]

        lang = get_lang(content)

        if lang == 'id':
            score = 5.0
        elif lang == 'en':
            score = -3.0
        elif lang == 'zh':
            score = -3.0
        else:
            score = -5.0

        scores.append(score)

    return scores

prompts = [
    [{"role": "assistant", "content": "What is the result of (1 + 2) * 4?"}],
    [{"role": "assistant", "content": "What is the result of (3 + 1) * 2?"}],
]
completions = [
    [{"role": "assistant", "content": "<think>The sum of 1 and 2 is 3, which we multiply by 4 to get 12.</think><answer>(1 + 2) * 4 = 12</answer>"}],
    [{"role": "assistant", "content": "The sum of 3 and 1 is 4, which we multiply by 2 to get 8. So (3 + 1) * 2 = 8."}],
]
format_and_language_reward_func(prompts=prompts, completions=completions)

global PRINTED_TIMES
PRINTED_TIMES = 0
global PRINT_EVERY_STEPS
PRINT_EVERY_STEPS = 5

def check_numbers(prompts, completions, answer, **kwargs):
    question = prompts[0][-1]["content"]
    responses = [completion[0]["content"] for completion in completions]

    extracted_responses = [
        guess.group(1)
        if (guess := match_numbers.search(r)) is not None else None \
        for r in responses
    ]

    scores = []
    # Print only every few steps
    global PRINTED_TIMES
    global PRINT_EVERY_STEPS
    if PRINTED_TIMES % PRINT_EVERY_STEPS == 0:
        print(
            '*'*20 + f"Question:\n{question}", f"\nAnswer:\n{answer[0]}", f"\nResponse:\n{responses[0]}", f"\nExtracted:\n{extracted_responses[0]}"
        )
    PRINTED_TIMES += 1

    for guess, true_answer in zip(extracted_responses, answer):
        if guess is None:
            scores.append(-2.5)
            continue
        # Convert to numbers
        try:
            true_answer = float(true_answer.strip())
            # Remove commas like in 123,456
            guess       = float(guess.strip().replace(",", ""))
            scores.append(3.5 if guess == true_answer else -1.5)
        except:
            scores.append(0)
            continue
    return scores

tokenized = dataset.map(
    lambda x: {"tokens" : tokenizer.apply_chat_template(x["prompt"], add_generation_prompt = True, tokenize = True)},
    batched = True,
)
print(tokenizer.decode(tokenized[0]["tokens"]))
tokenized = tokenized.map(lambda x: {"L" : len(x["tokens"])})

import numpy as np
maximum_length = int(np.quantile(tokenized["L"], 0.9))
print("Max Length = ", maximum_length)

# Filter only samples smaller than 90% max length
dataset = dataset.select(np.where(np.array(tokenized["L"]) <= maximum_length)[0])
del tokenized

max_prompt_length = maximum_length + 1 # + 1 just in case!
max_completion_length = max_seq_length - max_prompt_length

from vllm import SamplingParams
vllm_sampling_params = SamplingParams(
    min_p = 0.1,
    top_p = 1.0,
    top_k = -1,
    seed = 3407,
    stop = [tokenizer.eos_token],
    include_stop_str_in_output = True,
)

from trl import GRPOConfig, GRPOTrainer
training_args = GRPOConfig(
    vllm_sampling_params = vllm_sampling_params,
    temperature = 1.0,
    learning_rate = 5e-6,
    weight_decay = 0.01,
    warmup_ratio = 0.1,
    lr_scheduler_type = "linear",
    optim = "adamw_8bit",
    logging_steps = 1,
    per_device_train_batch_size = 1,
    gradient_accumulation_steps = 1, # Increase to 4 for smoother training
    num_generations = 4, # Decrease if out of memory
    max_prompt_length = max_prompt_length,
    max_completion_length = max_completion_length,
    # num_train_epochs = 1, # Set to 1 for a full training run
    max_steps = 100,
    save_steps = 100,
    report_to = "none", # Can use Weights & Biases
    output_dir = "outputs",

    # For optional training + evaluation
    # fp16_full_eval = True,
    # per_device_eval_batch_size = 4,
    # eval_accumulation_steps = 1,
    # eval_strategy = "steps",
    # eval_steps = 1,
)

trainer = GRPOTrainer(
    model = model,
    processing_class = tokenizer,
    reward_funcs = [
        match_format_exactly,
        match_format_approximately,
        check_answer,
        check_numbers,
        format_and_language_reward_func,
    ],
    args = training_args,
    train_dataset = dataset,

    # For optional training + evaluation
    # train_dataset = new_dataset["train"],
    # eval_dataset = new_dataset["test"],
)
trainer.train()

🎯 Magistral Conversational Reasoning

Fine-tune Magistral-24B for advanced conversational reasoning
Magistral notebook: https://github.com/unslothai/notebooks/blob/main/nb/Magistral_(24B)-Reasoning-Conversational.ipynb

👁️ Gemma3 Vision Support

Fine-tune Gemma3 vision models for multimodal tasks
Gemma3 Vision notebook: https://github.com/unslothai/notebooks/blob/main/nb/Gemma3_(4B)-Vision.ipynb

Documentation & Guides

Reinforcement Learning Guide: Complete guide on RL for LLMs covering GRPO, RLHF, DPO. Check it out here: https://docs.unsloth.ai/basics/reinforcement-learning-guide
LoRA Hyperparameters Guide: Master optimal learning rates, epochs, LoRA rank & alpha settings, Check it out here: https://docs.unsloth.ai/get-started/fine-tuning-guide/lora-hyperparameters-guide

What's Changed

Nightly by @danielhanchen in #2448
Added k_norm & q_norm to merged Qwen3 layers by @cblomert in #2452
MoE Kernel by @jeromeku in #2465
Blackwell Support by @johnnynunez in #2458
Added missing code of conduct by @rolandtannous in #2416
Fix readme example by @yuanzhedong in #2492
the pixtral vision notebook fails during inference by @mmathew23 in #2466
[1/N] Enable intel GPU for unsloth by @leizhenyuan in #2350
[2/N] Enable intel GPU for unsloth by @leizhenyuan in #2388
vLLM Windows CUDA support [tested] by @fenglui in #2158
Add Sesame CSM by @mmathew23 in #2527
Add Qwen-3 chat template and Ollama template support by @kiankyars in #2537
Fix typos by @omahs in #2540
Add use_rslora reference to LoraConfig inititalisation by @jkumz in #2539
TTS by @danielhanchen in #2545
Quick fix on the CompileConfig error by @Erland366 in #2554
Fix trust remote code by @Etherll in #2357
fix issue with qwen3 template double quote escapes by @davedgd in #2563
Display the model name in RoPE scaling unsupported error by @emmanuel-ferdman in #2564
Fix Whisper, ModernBERT by @danielhanchen in #2565
fix: improved error handling when llama.cpp build fails #2358 by @Hansehart in #2603
Remove dataset_text_field from SFTConfig by @qgallouedec in #2609
Upgrade trl fix by @Datta0 in #2544
Check the skip_prepare_dataset before accessing dataset fields. #2496 by @Premik in #2633
Llama4 MoE Grouped GEMM by @jeromeku in #2639
Latest TRL, GRPO + Bug fixes by @danielhanchen in #2645
Fix SFTtraining for new trl by @mmathew23 in #2647
Bug fixes by @danielhanchen in #2651
Fix quant model param fetch regex by @Datta0 in #2662
Fix batched generation for prompts of different lengths by @RunFMe in #2216
reroute merge logic language models + comprehensive tests + eval kits by @rolandtannous in #2673
unsloth checkpointing fix for latest transformers==4.52.x by @mmathew23 in #2674
patch sft_trainer to favor max_seq_length over max_length in config by @mmathew23 in #2669
Update prepare 4d causal attention call by @mmathew23 in #2678
Ignore None Values when building vllm subprocess_command by @Salpingopharyngeus in #2680
add support for torch270 with Intel GPU by @leizhenyuan in #2709
Making protobuf version more flexible by @user799595 in #2637
tests for additional merge fix unsloth zoo pr 163 by @rolandtannous in #2719
Reward modeling update (There seems to be another patch) by @pluesclues in #2710
Fix Typos in Documentation and Comments by @leopardracer in #2721
Fix renaming on other model than Llama by @Erland366 in #2762
Enable vLLM to share memory space by @Datta0 in #2712
Fix TRL 1.8.2 by @marcandrelarochelle in #2774
Fix AttributeError in GRPO trainer for models without llm attribute by @rolandtannous in #2780
Additional tests for unsloth-zoo PR#174 by @rolandtannous in #2779
Update pyproject.toml by @amrothemich in #2778
Fix for grpo_compute_loss_slow by @simpissa in #2702
Fix GRPO by @danielhanchen in #2787
Docs: Fix typo and improve MoE docstrings by @kilavvy in #2784
[5/N] Enable intel GPU for unsloth by @leizhenyuan in #2768
Sequence Classification Bug Fixes by @pluesclues in #2793
intel 5/N fix patch by @mmathew23 in #2792
[3/N] Enable intel GPU for unsloth by @leizhenyuan in #2620
[4/N] Enable intel GPU for unsloth by @mmathew23 in #2801
[intel] use DeviceProperties instead of torch.xxx.deviceproperties by @leizhenyuan in #2803
Fix grpo sleep regex and indentation by @Datta0 in #2804
Bug fixes by @danielhanchen in #2805
Bug fixes by @danielhanchen in #2807

New Contributors

@cblomert made their first contribution in #2452
@johnnynunez made their first contribution in #2458
@rolandtannous made their first contribution in #2416
@yuanzhedong made their first contribution in #2492
@mmathew23 made their first contribution in #2466
@leizhenyuan made their first contribution in #2350
@fenglui made their first contribution in #2158
@kiankyars made their first contribution in #2537
@omahs made their first contribution in #2540
@jkumz made their first contribution in #2539
@davedgd made their first contribution in #2563
@emmanuel-ferdman made their first contribution in #2564
@qgallouedec made their first contribution in #2609
@Premik made their first contribution in #2633
@RunFMe made their first contribution in #2216
@Salpingopharyngeus made their first contribution in #2680
@user799595 made their first contribution in #2637
@pluesclues made their first contribution in #2710
@leopardracer made their first contribution in #2721
@marcandrelarochelle made their first contribution in #2774
@amrothemich made their first contribution in #2778
@simpissa made their first contribution in #2702
@kilavvy made their first contribution in #2784

Full Changelog: May-2025...June-2025

unslothai/unsloth June-2025 Gemma 3n + Text-to-speech (TTS) on GitHub

✨ Gemma 3n now available

🎵 Text-to-Speech (TTS) Fine-tuning

🧠 DeepSeek-R1-0528 Support with Dynamic 1-bit GGUFs

📈 Dynamic 2.0 GGUFs

⚡ Advanced Qwen3 GRPO notebook

Documentation & Guides

What's Changed

New Contributors

unslothai/unsloth June-2025
Gemma 3n + Text-to-speech (TTS)

on GitHub