unslothai/unsloth v0.1.405-beta on GitHub

We've got lots of new updates. Please use the latest Unsloth v0.1.405-beta, not v0.1.40-beta which is older.

~2x faster GGUF inference with automatically enabled MTP
API support for OpenAI, Anthropic etc. with auto prompt caching, web search, code execution
Connect to external inference backends: vLLM, Ollama llama-server
Experimental MLX inference
Proper support for non-English languages
Security improvements

MTP speculative decoding support 1.4 to 2x faster inference!

Auto MTP speculative decoding for MTP GGUFs; warn when the bundled llama.cpp prebuilt is stale or too old for MTP
New pre-built llama.cpp binaries for MTP support!

API provider calling & external connections

You can now connect Unsloth to any API cloud provider (OpenAI, Anthropic, OpenRouter etc.)
Built-in web search for OpenAI, Anthropic, OpenRouter and Kimi
Built-in code execution for OpenAI and Anthropic (Anthropic containers persist and are reused across turns)
Prompt caching is enabled for OpenAI and Anthropic models saving 50 to 90% of costs.
API key now optional for local providers (llama.cpp / vLLM / Ollama)
Auto-load models when adding a cloud provider

MLX inference (Experimental)

MLX quants and models now can run locally on your Mac machines!
We'll be adding thinking, tools and web search soon!

Other Unsloth Studio updates

OpenDocument chat attachments
o3 reasoning summary payload
Sending/prompting non-English languages (e.g. Japanese, Chinese) now works properly
IME composer hardening, RTL dir="auto", long log-line truncation fix
Tool reasoning trace rendering in UI
Fully offline support: cached GGUF discovery and offline DNS auto-detect for both inference and training
Lots of UI/UX polish: dark theme refactor, right sidebar redesign, time-of-day sloth mascot, dismissable copyable toasts, larger chat composer, code-execution config polish, composer action pill styling, narrower Discord button

Training updates

Gemma attention mask fixes
Multi Image GRPO
GRPO hidden-state return experiments
New Continued Pretraining (CPT) training method as a first-class option
Gemma-4 MoE LoRA extractor registered to fix grouped_mm contraction crash
Opt-in fused lm_head + cross-entropy forward, with single-matmul path under UNSLOTH_RETURN_LOGITS=1
Pass batch size for eval
Eval/training paths now honour HF_DATASETS_OFFLINE alongside HF_HUB_OFFLINE

Unsloth Studio security improvements

Authentication rate-limiting, proxy-aware so reverse proxies don't bypass it
Sandboxed worker with a tightened blocklist (bash, hf upload, NOFILE)
Path containment so workers can't escape their in-flight tmp dirs
Strict schema validation across the Studio API
Tightened CSP / security headers (only legitimate favicon hosts allowed)
Removed the torch.load fallback on training_args.bin so untrusted pickles can never execute on model load
Hardened Tauri desktop release flow
Frontend auth: singleflight token refresh, current-password input on changes, working logout, shared 422 helper
Cancel cleanup now scoped strictly to in-flight tmp dirs so it can never delete user state

Bug fixes and correctness

Layout-aware MoE LoRA merge with loud-fail on fallback (no more silent wrong saves)
num_logits_to_keep regression fixed on transformers >= 4.52
Preserve tokenizer EOS token on merged saves
Resume PEFT checkpoints under sentence-transformers >= 5.4
Restore Flash > SDPA > Flex attention priority for non-Gemma3 models
ORPO text-only tokenization now works with processors
Embedding matrix size mismatch fix
Vicuna chat template fix
fast_generate unifies legacy and new logits kwargs (fixes Mistral merge site)
higher_precision_softmax made idempotent
Patch every LOSS_MAPPING key aliased to ForCausalLMLoss (covers transformers 5.x)
GGUF converter sibling imports fixed
UTF-8 encoding added to all text-mode file operations
Serialise GGUF reload and inherit unsloth-run extra args
Fix /recommended-folders 500 on unreadable model directories under Python 3.12+
Cross-family GGUF projector blocked in flat local dirs (no more wrong-vision-tower loads)

Installer and platform reliability

Custom install paths via STUDIO_HOME / UNSLOTH_STUDIO_HOME
CPU-only Linux x86_64 routed to ggml-org/llama.cpp prebuilts
Windows CUDA install fixes: paired cudart bundle and Torch NVIDIA DLL paths added to PATH
Skip flash-attn install on Blackwell GPUs (sm_100+)
Refresh Intel XPU extras for torch 2.7.1 / 2.9.1 / 2.10 / 2.11.0 / 2.12.0; torch upper cap raised to <2.13.0
HIP source builds on Ubuntu 24.04 now inject --gcc-install-dir
Linux prebuilt fixes for branch-based llama.cpp releases (mangled symlink repair, top-level dir strip)
New uninstallers for Linux, macOS (uninstall.sh) and Windows (uninstall.ps1)
Mac desktop shortcut spawning and lifecycle fixed
unsloth --version flag
Studio web update banner and release version display
GPU pinned at 95% headroom, with a warning on silent CPU fallback
Auto-install flash-linear-attention and tilelang for Qwen3.5 family

What's Changed in Unsloth

Bump installer floor to 2026.5.2 by @danielhanchen in #5297
install: support STUDIO_HOME / UNSLOTH_STUDIO_HOME for custom install paths by @danielhanchen in #5190
Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts by @danielhanchen in #5302
feat(studio): MLX training tab on Apple Silicon (LoRA / full FT, VLM, export) by @Manan17 in #5265
feat(studio): add Continued Pretraining (CPT) as a training method by @OnePunchMonk in #4677
Fix 14 stale tests under tests/studio/install/ that drifted from code by @danielhanchen in #5305
Add Studio PR-time CI: pin enforcement, frontend, backend, wheel smoke by @danielhanchen in #5298
Studio: restore Studio API and Help menu UI by @Imagineer99 in #5310
[studio]: Fix tool reasoning trace in UI by @CodeMan62 in #5314
fix: 3 patch_* helpers — fast_lora import, sft_trainer Union, openenv OSError by @danielhanchen in #5319
Studio: API settings overflow with long Colab URLs by @Imagineer99 in #5286
tests/studio/install: parallel UNSLOTH_STUDIO_HOME smoke test by @danielhanchen in #5306
Studio: Dark theme refactor, right sidebar redesign, and chat UI polish by @Imagineer99 in #5150
fix: harden Studio IME composer sends by @Etherll in #5327
Studio: stop truncating long log lines as suspected base64 by @rolandtannous in #5335
fix(gh_client): fail fast on 401/403 auth errors instead of retrying forever (#5325) by @Anai-Guo in #5329
fix: unblock 4 tests deselected/skipped in #5312 (real bugs) by @danielhanchen in #5359
fix(tests/sh): accept pinned tokenizers line after #5359 by @danielhanchen in #5361
CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests by @danielhanchen in #5312
studio/tests: make Playwright model-selector probe best-effort by @danielhanchen in #5371
Studio: download paired cudart bundle on Windows CUDA installs by @danielhanchen in #5322
Studio: add torch's pip nvidia DLL dirs to PATH on Windows by @danielhanchen in #5324
studio: authenticate HF downloads across Studio CI workflows by @danielhanchen in #5370
dependabot: group security updates and cover /studio/frontend npm advisories by @danielhanchen in #5372
Add Studio web update banner and release version display by @wasimysaid in #5308
ci/install: retry transient github.com 5xx on unsloth-zoo git fetches by @danielhanchen in #5389
studio/ci: pre-install lockfile supply-chain audit (npm + cargo) by @danielhanchen in #5392
studio/ci: npm tarball content scanner (no-install, hostile-input safe) by @danielhanchen in #5393
studio/tests: AbortSignal-bound in-page fetches and wall-clock watchdog for Playwright probes by @danielhanchen in #5391
chore: remove unused .semgrep/unsloth-rules.yml by @danielhanchen in #5395
studio/ci: sweep actions/cache v5 hardening across sibling smoke workflows by @danielhanchen in #5399
studio/ci: harden HF_HOME cache against actions/cache v5 silent restore failures by @danielhanchen in #5396
Harden Tauri release flow by @wasimysaid in #5341
Gemma attn by @Datta0 in #5346
Multi Image GRPO by @Datta0 in #5197
[GRPO] Try returning hidden statex for GRPO by @Datta0 in #5142
Studio: pin GPU at 95% headroom and warn on silent CPU fallback by @danielhanchen in #5323
Chore(deps): bump the actions group across 1 directory with 4 updates by @dependabot[bot] in #5394
security: NOT affected by Mini Shai-Hulud (May-12 wave) -- forward-looking hardening only by @danielhanchen in #5397
studio: security and hardening pass (auth rate-limit, sandbox, path containment, schema validation, headers) by @danielhanchen in #5375
studio: fix training page regressions from the security hardening pass by @rolandtannous in #5409
Studio: parity of thinking trace icon with Think toggle icon by @Imagineer99 in #5407
Studio: vary empty chat sloth mascot by local time of day by @Imagineer99 in #5354
security: persist-credentials:false on every actions/checkout (org-wide sweep) by @danielhanchen in #5413
import_fixes: stub transformers.conversion_mapping so peft 0.19.x imports on transformers 4.x by @danielhanchen in #5416
chore: trim verbose comments added in PR #5416 (commit 12295c1) by @danielhanchen in #5418
studio/ci: flat GGUF+mmproj cache for Mac json-images smoke, save partial caches on cancel by @danielhanchen in #5417
studio: comment out training_args.bin torch.load fallback in model_config by @danielhanchen in #5419
tests: import_fixes drift detectors (HARD GATE on Core matrix) by @danielhanchen in #5414
tests: drift detector parity with unsloth-zoo (fix Core matrix RED on triton + vllm) by @danielhanchen in #5421
scripts: ship deterministic comment / docstring-only diff verifier by @danielhanchen in #5422
studio: API external provider support for chat (OpenAI, Mistral, Gemini, Cohere, Anthropic, OpenRouter, DeepSeek, custom providers) by @rolandtannous in #4706
import_fixes + drift detectors: cover transformers 5.x drift (unblocks PR #5376) by @danielhanchen in #5423
MLX training support for Studio on Apple Silicon by @mmathew23 in #5340
studio: drop unused max_grad_value schema + route plumbing by @danielhanchen in #5424
Studio: Passing batch size for eval by @uderbashi in #5168
studio: skip flash-attn install on Blackwell GPUs (sm_100+) by @rolandtannous in #5420
Fix: Add missing utf-8 encoding to text-mode file operations by @Tenith01 in #5356
Fix/issue 3667 vicuna template by @Tenith01 in #5357
tests: public-api surface drift detector (companion to test_import_fixes_drift.py) by @danielhanchen in #5428
add UNSLOTH_ALLOW_CPU=1 path for CPU-only CI / source-inspection tests by @danielhanchen in #5429
fix(studio/mmproj): block cross-family projectors in flat local GGUF dirs (#5347) by @Anai-Guo in #5350
studio/mmproj: skip unwanted GGUF values via seek instead of read by @danielhanchen in #5431
ci: install ipython so transformers.utils.notebook imports cleanly in zoo pytest by @danielhanchen in #5437
studio/mlx: lower per-element grad clip default from 5.0 to 1.0 by @danielhanchen in #5440
studio/frontend: drop unused next dependency by @danielhanchen in #5438
Update version-compat-ci.yml by @rolandtannous in #5445
ci: merge duplicate with: keys in notebooks-ci checkout steps by @rolandtannous in #5447
studio/chat: built-in web search for OpenAI, Anthropic, OpenRouter, Kimi by @rolandtannous in #5443
ci: make compiler-cache shim test order-independent by @danielhanchen in #5449
Studio: o3 reasoning summary payload by @Imagineer99 in #5426
ci: compiler-cache-shim must mutate live module globals + skip rerun by @danielhanchen in #5452
Polish/cloud to providers by @Imagineer99 in #5450
ci: cap each compiler-sweep iteration with SIGALRM + log progress by @danielhanchen in #5456
ci: add tx >=5,<6 slow compile model_types to KNOWN_BROKEN_COMPILE by @danielhanchen in #5458
Restore Flash > SDPA > Flex priority for non-gemma3 models by @mmathew23 in #5455
ci: stop a partial mmproj cache from poisoning Mac Studio GGUF CI by @danielhanchen in #5459
ci: make Windows Stop Studio teardown tolerate Git Bash signal exit by @danielhanchen in #5460
Studio: make API key optional for local providers (llama.cpp/vLLM/Ollama) by @Imagineer99 in #5457
studio/chat: built-in code execution for OpenAI + Anthropic by @rolandtannous in #5461
ci: switch Windows Stop Studio to a cmd no-op marker by @danielhanchen in #5462
tests: raise pwsh/bash subprocess timeout from 10s to 60s by @danielhanchen in #5463
studio/install: repair upstream llama.cpp prebuilt mangled symlinks by @danielhanchen in #5465
studio/chat: OpenAI container picker delete reliability by @rolandtannous in #5466
studio/install: strip top-level dir from repaired symlink target by @danielhanchen in #5467
Stop: drop Ollama API key, clean up code execution UI by @Imagineer99 in #5464
tests/openai: patch httpx.AsyncClient ctor so delete tests hit mock by @danielhanchen in #5469
revert: stop touching DEVICE_TYPE == cuda branches for CPU CI by @danielhanchen in #5473
ci: drop cache: 'npm' from setup-node (silent abort on Windows) by @danielhanchen in #5474
ci: bump Mac json-images timeout 30 -> 45 min (cache-miss path) by @danielhanchen in #5475
ci: wrap hf download in xet-tuned stall-retry loop (root-cause Mac 30-min hang) by @danielhanchen in #5476
ci: deterministic check for studio/frontend dep removals by @danielhanchen in #5478
studio/frontend: drop unused dependencies, move type pkg to devDeps by @danielhanchen in #5477
intel-gpu: refresh xpu extras (fix torch 2.10, add 2.7.1 / 2.9.1 / 2.11.0 / 2.12.0) by @danielhanchen in #5484
Studio: auto-load models when adding a cloud provider by @Imagineer99 in #5472
Studio: code execution config visual polish by @Imagineer99 in #5471
disable_torchcodec_if_broken: also patch datasets and clean sys.modules by @danielhanchen in #5483
tests: pinned-symbol canary for unsloth-zoo save_pretrained_merged guards (#5410) by @danielhanchen in #5433
intel-gpu: pin unsloth_zoo>=2026.5.2 (fixes #5494) by @danielhanchen in #5499
fix(sentence_transformer): resume PEFT checkpoints under sentence-transformers >= 5.4 by @Etherll in #5454
Studio: serialise GGUF reload and inherit unsloth-run extra args by @danielhanchen in #5427
Studio: IME / multilingual composer regression test + RTL dir="auto" by @danielhanchen in #5485
fix: preserve tokenizer eos token on merged saves by @anmolxlight in #5451
studio/chat: reuse Anthropic code_execution container across turns by @rolandtannous in #5519
Fix Linux prebuilt installs for branch-based llama.cpp releases by @mmathew23 in #5493
Studio: stop hint, Uvicorn log rename, reachability check + Mac UI CI retry hardening by @danielhanchen in #5503
Studio composer action pill styling by @Imagineer99 in #5522
Fix /recommended-folders 500 on unreadable model directories (Python 3.12+) by @mmathew23 in #5523
studio/chat: persist Anthropic container id on first turn of new thread by @rolandtannous in #5526
studio/openai: align chat completions docstring with stream=false default (closes #5047) by @wtfashwin in #5524
Add a simple --version flag by @melroy89 in #5516
studio: load cached GGUF models when fully offline by @shimmyshimmer in #5505
studio: expose launcher capability bits on unauth /api/health by @danielhanchen in #5486
studio: tighten sandbox blocklist precision (bash, hf upload, NOFILE) by @danielhanchen in #5487
studio: scope cancel-cleanup to in-flight tmp dirs; walk back tool_call_id by @danielhanchen in #5488
studio: proxy-aware login rate-limit; allow google favicons in CSP by @danielhanchen in #5489
studio/frontend: wire logout, singleflight refresh, shared 422 helper, current-password input by @danielhanchen in #5490
tests/studio: lock in Windows GPU detection fix (#5106) with a synthetic CI test by @danielhanchen in #5376
Studio: auto-enable MTP speculative decoding for MTP GGUFs by @danielhanchen in #5527
Studio: warn when llama.cpp prebuilt is too old for MTP by @danielhanchen in #5528
Studio: warn when llama.cpp prebuilt is at least 3 days behind by @danielhanchen in #5529
studio: extend offline DNS auto-detect to inference parent + training by @danielhanchen in #5512
Fix ORPO text-only tokenization with processors by @alkinun in #5501
fix(studio/worker): inject --gcc-install-dir for HIP source builds on Ubuntu 24.04 by @h34v3nzc0dex in #5517
studio: gate image input on effective vision capability by @Etherll in #5492
studio/install: fix mac desktop shortcut spawning and lifecycle by @shimmyshimmer in #5496
studio: add uninstall.sh and document it in README by @shimmyshimmer in #5497
Studio update CI: round-trip install -> update -> uninstall by @danielhanchen in #5536
studio: fix Connections dialog UX issues surfaced by image-gate probe by @danielhanchen in #5518
studio: add uninstall.ps1 for Windows by @danielhanchen in #5513
Fix num_logits_to_keep regression on transformers >= 4.52 by @danielhanchen in #5538
Uninstaller script by @PTFOPlayer in #4611
Add OpenDocument chat attachments by @alkinun in #5510
studio/frontend: stop showing Generating spinner on empty welcome view by @shimmyshimmer in #5530
studio/frontend: swap Hugeicons spokes spinner for CSS ring by @shimmyshimmer in #5531
studio/frontend: grow chat composer to 16 rows and inset scrollbar by @shimmyshimmer in #5540
studio/frontend: make toast and inline error text selectable and copyable by @shimmyshimmer in #5506
studio: add dismissable toasts with corner close button by @shimmyshimmer in #5509
studio: install flash-linear-attention and tilelang for Qwen3.5 family by @danielhanchen in #5434
studio/frontend: soften toast shadow and tighten vertical padding by @shimmyshimmer in #5511
fast_generate: unify legacy/new logits kwarg + fix Mistral merge site by @danielhanchen in #5543
studio/frontend: hide Current password input on first boot by @danielhanchen in #5545
tests/studio: tighten MLX smoke gates (loss + round-trip, _on_step grad_norm) by @danielhanchen in #5537
tests + CI: callback signature drift detector by @danielhanchen in #5498
images: use narrower Discord button and drop duplicate by @danielhanchen in #5552
fix(studio): handle expired OpenAI shell-tool containers without surfacing error in chat by @rolandtannous in #5547
studio/chat: release stuck IME flag when compositionend never fires by @wtfashwin in #5551

New Contributors

@Anai-Guo made their first contribution in #5329
@uderbashi made their first contribution in #5168
@Tenith01 made their first contribution in #5356
@anmolxlight made their first contribution in #5451
@wtfashwin made their first contribution in #5524
@melroy89 made their first contribution in #5516
@h34v3nzc0dex made their first contribution in #5517
@PTFOPlayer made their first contribution in #4611

Full Changelog: v0.1.39-beta...v0.1.40-beta

What's Changed in Unsloth-Zoo

Register Gemma-4 MoE LoRA extractor to fix grouped_mm contraction crash by @danielhanchen in unslothai/unsloth-zoo#624
feat(mlx): Apple Silicon training (text + VLM, LoRA / full FT, CCE, export) by @Manan17 in unslothai/unsloth-zoo#620
tests: skip MoE LoRA extractor coverage when discovery finds zero classes by @danielhanchen in unslothai/unsloth-zoo#628
tests: pivot MoE-coverage canary to _unsloth_already_patched marker by @danielhanchen in unslothai/unsloth-zoo#630
fix(compiler): make higher_precision_softmax idempotent by @danielhanchen in unslothai/unsloth-zoo#631
fix(mlx): unblock GGUF export and LoRA reload on Apple Silicon by @danielhanchen in unslothai/unsloth-zoo#627
fix(compiler): unblock all model_types across transformers 4.57.6 and 5.x by @danielhanchen in unslothai/unsloth-zoo#632
Mask for gemma3 attn by @Datta0 in unslothai/unsloth-zoo#635
Multi Image GRPO by @Datta0 in unslothai/unsloth-zoo#613
[GRPO] Try returning hidden statex for GRPO by @Datta0 in unslothai/unsloth-zoo#609
Refactor and consolidate moe lora extractors by @Datta0 in unslothai/unsloth-zoo#629
security + CI: mirror unsloth's hardening stack onto zoo (greenfield .github/) by @danielhanchen in unslothai/unsloth-zoo#637
remove unsloth_zoo/import_fixes.py: redundant with unsloth's by @danielhanchen in unslothai/unsloth-zoo#639
chore: trim verbose comments across PR #637 landing by @danielhanchen in unslothai/unsloth-zoo#640
scripts: ship deterministic comment / docstring-only diff verifier by @danielhanchen in unslothai/unsloth-zoo#641
fix mlx: Adds the MLX training path used by Studio on Apple Silicon by @mmathew23 in unslothai/unsloth-zoo#634
tests: drift detectors cover transformers 5.x (mirror unsloth PR #5423) by @danielhanchen in unslothai/unsloth-zoo#642
gpt_oss: reorder helpers before patch_gpt_oss_bnb4bit_auto by @danielhanchen in unslothai/unsloth-zoo#643
init: lazy-load legacy MLX aliases on every host by @danielhanchen in unslothai/unsloth-zoo#644
tests: contain security-conftest network block; fix stale mlx paths; skip GPU import in trainer-exec-marker by @danielhanchen in unslothai/unsloth-zoo#648
fix CI fallout from MLX subpackage refactor (#634) by @danielhanchen in unslothai/unsloth-zoo#646
tests: tolerate transformers 5.x source/signature drift in two zoo drift detectors by @danielhanchen in unslothai/unsloth-zoo#650
tests: skip _assert_params_superset when upstream forward is (*args, **kwargs) by @danielhanchen in unslothai/unsloth-zoo#651
mlx: lower max_grad_value default from 5.0 to 1.0 by @danielhanchen in unslothai/unsloth-zoo#652
saving: layout-aware MoE LoRA merge + loud-fail on fallback (#5410) by @danielhanchen in unslothai/unsloth-zoo#647
tests: follow MoE merge wrapper delegation in drift detector by @danielhanchen in unslothai/unsloth-zoo#653
additional import try except handling for mlx by @mmathew23 in unslothai/unsloth-zoo#654
Patch every LOSS_MAPPING key aliased to ForCausalLMLoss by @danielhanchen in unslothai/unsloth-zoo#656
deps: bump torch upper cap to <2.13.0 (allow xpu 2.11.0 / 2.12.0) by @danielhanchen in unslothai/unsloth-zoo#658
Auto-install fused lm_head + cross_entropy forward (opt-in) by @danielhanchen in unslothai/unsloth-zoo#657
tests: CPU regression detectors for the MoE merge / save path (#5410) by @danielhanchen in unslothai/unsloth-zoo#655
Fix GGUF converter sibling imports by @alkinun in unslothai/unsloth-zoo#661
fix embedding matrix size mismatch bug by @CodeMan62 in unslothai/unsloth-zoo#645
Honor UNSLOTH_RETURN_LOGITS in fused forward by @danielhanchen in unslothai/unsloth-zoo#665
init: include HF_DATASETS_OFFLINE in the offline env cross-sync by @danielhanchen in unslothai/unsloth-zoo#664
compiler: single-matmul opt-in for UNSLOTH_RETURN_LOGITS=1 by @danielhanchen in unslothai/unsloth-zoo#666

unslothai/unsloth v0.1.405-beta Qwen3.6 MTP and API / Connections on GitHub

MTP speculative decoding support 1.4 to 2x faster inference!

API provider calling & external connections

MLX inference (Experimental)

Other Unsloth Studio updates

Training updates

Unsloth Studio security improvements

Bug fixes and correctness

Installer and platform reliability

What's Changed in Unsloth

New Contributors

What's Changed in Unsloth-Zoo

unslothai/unsloth v0.1.405-beta
Qwen3.6 MTP and API / Connections

on GitHub