We've got lots of new updates. Please use the latest Unsloth v0.1.405-beta, not v0.1.40-beta which is older.
- ~2x faster GGUF inference with automatically enabled MTP
- API support for OpenAI, Anthropic etc. with auto prompt caching, web search, code execution
- Connect to external inference backends: vLLM, Ollama llama-server
- Experimental MLX inference
- Proper support for non-English languages
- Security improvements
MTP speculative decoding support 1.4 to 2x faster inference!
- Auto MTP speculative decoding for MTP GGUFs; warn when the bundled llama.cpp prebuilt is stale or too old for MTP
- New pre-built llama.cpp binaries for MTP support!
API provider calling & external connections
- You can now connect Unsloth to any API cloud provider (OpenAI, Anthropic, OpenRouter etc.)
- Built-in web search for OpenAI, Anthropic, OpenRouter and Kimi
- Built-in code execution for OpenAI and Anthropic (Anthropic containers persist and are reused across turns)
- Prompt caching is enabled for OpenAI and Anthropic models saving 50 to 90% of costs.
- API key now optional for local providers (llama.cpp / vLLM / Ollama)
- Auto-load models when adding a cloud provider
MLX inference (Experimental)
- MLX quants and models now can run locally on your Mac machines!
- We'll be adding thinking, tools and web search soon!
Other Unsloth Studio updates
- OpenDocument chat attachments
- o3 reasoning summary payload
- Sending/prompting non-English languages (e.g. Japanese, Chinese) now works properly
- IME composer hardening, RTL
dir="auto", long log-line truncation fix - Tool reasoning trace rendering in UI
- Fully offline support: cached GGUF discovery and offline DNS auto-detect for both inference and training
- Lots of UI/UX polish: dark theme refactor, right sidebar redesign, time-of-day sloth mascot, dismissable copyable toasts, larger chat composer, code-execution config polish, composer action pill styling, narrower Discord button
Training updates
- Gemma attention mask fixes
- Multi Image GRPO
- GRPO hidden-state return experiments
- New Continued Pretraining (CPT) training method as a first-class option
- Gemma-4 MoE LoRA extractor registered to fix
grouped_mmcontraction crash - Opt-in fused
lm_head+ cross-entropy forward, with single-matmul path underUNSLOTH_RETURN_LOGITS=1 - Pass batch size for eval
- Eval/training paths now honour
HF_DATASETS_OFFLINEalongsideHF_HUB_OFFLINE
Unsloth Studio security improvements
- Authentication rate-limiting, proxy-aware so reverse proxies don't bypass it
- Sandboxed worker with a tightened blocklist (bash,
hf upload,NOFILE) - Path containment so workers can't escape their in-flight tmp dirs
- Strict schema validation across the Studio API
- Tightened CSP / security headers (only legitimate favicon hosts allowed)
- Removed the
torch.loadfallback ontraining_args.binso untrusted pickles can never execute on model load - Hardened Tauri desktop release flow
- Frontend auth: singleflight token refresh, current-password input on changes, working logout, shared 422 helper
- Cancel cleanup now scoped strictly to in-flight tmp dirs so it can never delete user state
Bug fixes and correctness
- Layout-aware MoE LoRA merge with loud-fail on fallback (no more silent wrong saves)
num_logits_to_keepregression fixed on transformers >= 4.52- Preserve tokenizer EOS token on merged saves
- Resume PEFT checkpoints under sentence-transformers >= 5.4
- Restore Flash > SDPA > Flex attention priority for non-Gemma3 models
- ORPO text-only tokenization now works with processors
- Embedding matrix size mismatch fix
- Vicuna chat template fix
fast_generateunifies legacy and new logits kwargs (fixes Mistral merge site)higher_precision_softmaxmade idempotent- Patch every
LOSS_MAPPINGkey aliased toForCausalLMLoss(covers transformers 5.x) - GGUF converter sibling imports fixed
- UTF-8 encoding added to all text-mode file operations
- Serialise GGUF reload and inherit
unsloth-runextra args - Fix
/recommended-folders500 on unreadable model directories under Python 3.12+ - Cross-family GGUF projector blocked in flat local dirs (no more wrong-vision-tower loads)
Installer and platform reliability
- Custom install paths via
STUDIO_HOME/UNSLOTH_STUDIO_HOME - CPU-only Linux x86_64 routed to
ggml-org/llama.cppprebuilts - Windows CUDA install fixes: paired
cudartbundle and Torch NVIDIA DLL paths added toPATH - Skip
flash-attninstall on Blackwell GPUs (sm_100+) - Refresh Intel XPU extras for torch 2.7.1 / 2.9.1 / 2.10 / 2.11.0 / 2.12.0; torch upper cap raised to <2.13.0
- HIP source builds on Ubuntu 24.04 now inject
--gcc-install-dir - Linux prebuilt fixes for branch-based llama.cpp releases (mangled symlink repair, top-level dir strip)
- New uninstallers for Linux, macOS (
uninstall.sh) and Windows (uninstall.ps1) - Mac desktop shortcut spawning and lifecycle fixed
unsloth --versionflag- Studio web update banner and release version display
- GPU pinned at 95% headroom, with a warning on silent CPU fallback
- Auto-install flash-linear-attention and tilelang for Qwen3.5 family
What's Changed in Unsloth
- Bump installer floor to 2026.5.2 by @danielhanchen in #5297
- install: support STUDIO_HOME / UNSLOTH_STUDIO_HOME for custom install paths by @danielhanchen in #5190
- Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts by @danielhanchen in #5302
- feat(studio): MLX training tab on Apple Silicon (LoRA / full FT, VLM, export) by @Manan17 in #5265
- feat(studio): add Continued Pretraining (CPT) as a training method by @OnePunchMonk in #4677
- Fix 14 stale tests under tests/studio/install/ that drifted from code by @danielhanchen in #5305
- Add Studio PR-time CI: pin enforcement, frontend, backend, wheel smoke by @danielhanchen in #5298
- Studio: restore Studio API and Help menu UI by @Imagineer99 in #5310
- [studio]: Fix tool reasoning trace in UI by @CodeMan62 in #5314
- fix: 3 patch_* helpers — fast_lora import, sft_trainer Union, openenv OSError by @danielhanchen in #5319
- Studio: API settings overflow with long Colab URLs by @Imagineer99 in #5286
- tests/studio/install: parallel UNSLOTH_STUDIO_HOME smoke test by @danielhanchen in #5306
- Studio: Dark theme refactor, right sidebar redesign, and chat UI polish by @Imagineer99 in #5150
- fix: harden Studio IME composer sends by @Etherll in #5327
- Studio: stop truncating long log lines as suspected base64 by @rolandtannous in #5335
- fix(gh_client): fail fast on 401/403 auth errors instead of retrying forever (#5325) by @Anai-Guo in #5329
- fix: unblock 4 tests deselected/skipped in #5312 (real bugs) by @danielhanchen in #5359
- fix(tests/sh): accept pinned tokenizers line after #5359 by @danielhanchen in #5361
- CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests by @danielhanchen in #5312
- studio/tests: make Playwright model-selector probe best-effort by @danielhanchen in #5371
- Studio: download paired cudart bundle on Windows CUDA installs by @danielhanchen in #5322
- Studio: add torch's pip nvidia DLL dirs to PATH on Windows by @danielhanchen in #5324
- studio: authenticate HF downloads across Studio CI workflows by @danielhanchen in #5370
- dependabot: group security updates and cover /studio/frontend npm advisories by @danielhanchen in #5372
- Add Studio web update banner and release version display by @wasimysaid in #5308
- ci/install: retry transient github.com 5xx on unsloth-zoo git fetches by @danielhanchen in #5389
- studio/ci: pre-install lockfile supply-chain audit (npm + cargo) by @danielhanchen in #5392
- studio/ci: npm tarball content scanner (no-install, hostile-input safe) by @danielhanchen in #5393
- studio/tests: AbortSignal-bound in-page fetches and wall-clock watchdog for Playwright probes by @danielhanchen in #5391
- chore: remove unused .semgrep/unsloth-rules.yml by @danielhanchen in #5395
- studio/ci: sweep actions/cache v5 hardening across sibling smoke workflows by @danielhanchen in #5399
- studio/ci: harden HF_HOME cache against actions/cache v5 silent restore failures by @danielhanchen in #5396
- Harden Tauri release flow by @wasimysaid in #5341
- Gemma attn by @Datta0 in #5346
- Multi Image GRPO by @Datta0 in #5197
- [GRPO] Try returning hidden statex for GRPO by @Datta0 in #5142
- Studio: pin GPU at 95% headroom and warn on silent CPU fallback by @danielhanchen in #5323
- Chore(deps): bump the actions group across 1 directory with 4 updates by @dependabot[bot] in #5394
- security: NOT affected by Mini Shai-Hulud (May-12 wave) -- forward-looking hardening only by @danielhanchen in #5397
- studio: security and hardening pass (auth rate-limit, sandbox, path containment, schema validation, headers) by @danielhanchen in #5375
- studio: fix training page regressions from the security hardening pass by @rolandtannous in #5409
- Studio: parity of thinking trace icon with Think toggle icon by @Imagineer99 in #5407
- Studio: vary empty chat sloth mascot by local time of day by @Imagineer99 in #5354
- security: persist-credentials:false on every actions/checkout (org-wide sweep) by @danielhanchen in #5413
- import_fixes: stub transformers.conversion_mapping so peft 0.19.x imports on transformers 4.x by @danielhanchen in #5416
- chore: trim verbose comments added in PR #5416 (commit 12295c1) by @danielhanchen in #5418
- studio/ci: flat GGUF+mmproj cache for Mac json-images smoke, save partial caches on cancel by @danielhanchen in #5417
- studio: comment out training_args.bin torch.load fallback in model_config by @danielhanchen in #5419
- tests: import_fixes drift detectors (HARD GATE on Core matrix) by @danielhanchen in #5414
- tests: drift detector parity with unsloth-zoo (fix Core matrix RED on triton + vllm) by @danielhanchen in #5421
- scripts: ship deterministic comment / docstring-only diff verifier by @danielhanchen in #5422
- studio: API external provider support for chat (OpenAI, Mistral, Gemini, Cohere, Anthropic, OpenRouter, DeepSeek, custom providers) by @rolandtannous in #4706
- import_fixes + drift detectors: cover transformers 5.x drift (unblocks PR #5376) by @danielhanchen in #5423
- MLX training support for Studio on Apple Silicon by @mmathew23 in #5340
- studio: drop unused max_grad_value schema + route plumbing by @danielhanchen in #5424
- Studio: Passing batch size for eval by @uderbashi in #5168
- studio: skip flash-attn install on Blackwell GPUs (sm_100+) by @rolandtannous in #5420
- Fix: Add missing utf-8 encoding to text-mode file operations by @Tenith01 in #5356
- Fix/issue 3667 vicuna template by @Tenith01 in #5357
- tests: public-api surface drift detector (companion to test_import_fixes_drift.py) by @danielhanchen in #5428
- add UNSLOTH_ALLOW_CPU=1 path for CPU-only CI / source-inspection tests by @danielhanchen in #5429
- fix(studio/mmproj): block cross-family projectors in flat local GGUF dirs (#5347) by @Anai-Guo in #5350
- studio/mmproj: skip unwanted GGUF values via seek instead of read by @danielhanchen in #5431
- ci: install ipython so transformers.utils.notebook imports cleanly in zoo pytest by @danielhanchen in #5437
- studio/mlx: lower per-element grad clip default from 5.0 to 1.0 by @danielhanchen in #5440
- studio/frontend: drop unused next dependency by @danielhanchen in #5438
- Update version-compat-ci.yml by @rolandtannous in #5445
- ci: merge duplicate
with:keys in notebooks-ci checkout steps by @rolandtannous in #5447 - studio/chat: built-in web search for OpenAI, Anthropic, OpenRouter, Kimi by @rolandtannous in #5443
- ci: make compiler-cache shim test order-independent by @danielhanchen in #5449
- Studio: o3 reasoning summary payload by @Imagineer99 in #5426
- ci: compiler-cache-shim must mutate live module globals + skip rerun by @danielhanchen in #5452
- Polish/cloud to providers by @Imagineer99 in #5450
- ci: cap each compiler-sweep iteration with SIGALRM + log progress by @danielhanchen in #5456
- ci: add tx >=5,<6 slow compile model_types to KNOWN_BROKEN_COMPILE by @danielhanchen in #5458
- Restore Flash > SDPA > Flex priority for non-gemma3 models by @mmathew23 in #5455
- ci: stop a partial mmproj cache from poisoning Mac Studio GGUF CI by @danielhanchen in #5459
- ci: make Windows Stop Studio teardown tolerate Git Bash signal exit by @danielhanchen in #5460
- Studio: make API key optional for local providers (llama.cpp/vLLM/Ollama) by @Imagineer99 in #5457
- studio/chat: built-in code execution for OpenAI + Anthropic by @rolandtannous in #5461
- ci: switch Windows Stop Studio to a cmd no-op marker by @danielhanchen in #5462
- tests: raise pwsh/bash subprocess timeout from 10s to 60s by @danielhanchen in #5463
- studio/install: repair upstream llama.cpp prebuilt mangled symlinks by @danielhanchen in #5465
- studio/chat: OpenAI container picker delete reliability by @rolandtannous in #5466
- studio/install: strip top-level dir from repaired symlink target by @danielhanchen in #5467
- Stop: drop Ollama API key, clean up code execution UI by @Imagineer99 in #5464
- tests/openai: patch httpx.AsyncClient ctor so delete tests hit mock by @danielhanchen in #5469
- revert: stop touching DEVICE_TYPE == cuda branches for CPU CI by @danielhanchen in #5473
- ci: drop
cache: 'npm'from setup-node (silent abort on Windows) by @danielhanchen in #5474 - ci: bump Mac json-images timeout 30 -> 45 min (cache-miss path) by @danielhanchen in #5475
- ci: wrap hf download in xet-tuned stall-retry loop (root-cause Mac 30-min hang) by @danielhanchen in #5476
- ci: deterministic check for studio/frontend dep removals by @danielhanchen in #5478
- studio/frontend: drop unused dependencies, move type pkg to devDeps by @danielhanchen in #5477
- intel-gpu: refresh xpu extras (fix torch 2.10, add 2.7.1 / 2.9.1 / 2.11.0 / 2.12.0) by @danielhanchen in #5484
- Studio: auto-load models when adding a cloud provider by @Imagineer99 in #5472
- Studio: code execution config visual polish by @Imagineer99 in #5471
- disable_torchcodec_if_broken: also patch datasets and clean sys.modules by @danielhanchen in #5483
- tests: pinned-symbol canary for unsloth-zoo save_pretrained_merged guards (#5410) by @danielhanchen in #5433
- intel-gpu: pin unsloth_zoo>=2026.5.2 (fixes #5494) by @danielhanchen in #5499
- fix(sentence_transformer): resume PEFT checkpoints under sentence-transformers >= 5.4 by @Etherll in #5454
- Studio: serialise GGUF reload and inherit unsloth-run extra args by @danielhanchen in #5427
- Studio: IME / multilingual composer regression test + RTL dir="auto" by @danielhanchen in #5485
- fix: preserve tokenizer eos token on merged saves by @anmolxlight in #5451
- studio/chat: reuse Anthropic code_execution container across turns by @rolandtannous in #5519
- Fix Linux prebuilt installs for branch-based llama.cpp releases by @mmathew23 in #5493
- Studio: stop hint, Uvicorn log rename, reachability check + Mac UI CI retry hardening by @danielhanchen in #5503
- Studio composer action pill styling by @Imagineer99 in #5522
- Fix /recommended-folders 500 on unreadable model directories (Python 3.12+) by @mmathew23 in #5523
- studio/chat: persist Anthropic container id on first turn of new thread by @rolandtannous in #5526
- studio/openai: align chat completions docstring with stream=false default (closes #5047) by @wtfashwin in #5524
- Add a simple --version flag by @melroy89 in #5516
- studio: load cached GGUF models when fully offline by @shimmyshimmer in #5505
- studio: expose launcher capability bits on unauth /api/health by @danielhanchen in #5486
- studio: tighten sandbox blocklist precision (bash, hf upload, NOFILE) by @danielhanchen in #5487
- studio: scope cancel-cleanup to in-flight tmp dirs; walk back tool_call_id by @danielhanchen in #5488
- studio: proxy-aware login rate-limit; allow google favicons in CSP by @danielhanchen in #5489
- studio/frontend: wire logout, singleflight refresh, shared 422 helper, current-password input by @danielhanchen in #5490
- tests/studio: lock in Windows GPU detection fix (#5106) with a synthetic CI test by @danielhanchen in #5376
- Studio: auto-enable MTP speculative decoding for MTP GGUFs by @danielhanchen in #5527
- Studio: warn when llama.cpp prebuilt is too old for MTP by @danielhanchen in #5528
- Studio: warn when llama.cpp prebuilt is at least 3 days behind by @danielhanchen in #5529
- studio: extend offline DNS auto-detect to inference parent + training by @danielhanchen in #5512
- Fix ORPO text-only tokenization with processors by @alkinun in #5501
- fix(studio/worker): inject --gcc-install-dir for HIP source builds on Ubuntu 24.04 by @h34v3nzc0dex in #5517
- studio: gate image input on effective vision capability by @Etherll in #5492
- studio/install: fix mac desktop shortcut spawning and lifecycle by @shimmyshimmer in #5496
- studio: add uninstall.sh and document it in README by @shimmyshimmer in #5497
- Studio update CI: round-trip install -> update -> uninstall by @danielhanchen in #5536
- studio: fix Connections dialog UX issues surfaced by image-gate probe by @danielhanchen in #5518
- studio: add uninstall.ps1 for Windows by @danielhanchen in #5513
- Fix num_logits_to_keep regression on transformers >= 4.52 by @danielhanchen in #5538
- Uninstaller script by @PTFOPlayer in #4611
- Add OpenDocument chat attachments by @alkinun in #5510
- studio/frontend: stop showing Generating spinner on empty welcome view by @shimmyshimmer in #5530
- studio/frontend: swap Hugeicons spokes spinner for CSS ring by @shimmyshimmer in #5531
- studio/frontend: grow chat composer to 16 rows and inset scrollbar by @shimmyshimmer in #5540
- studio/frontend: make toast and inline error text selectable and copyable by @shimmyshimmer in #5506
- studio: add dismissable toasts with corner close button by @shimmyshimmer in #5509
- studio: install flash-linear-attention and tilelang for Qwen3.5 family by @danielhanchen in #5434
- studio/frontend: soften toast shadow and tighten vertical padding by @shimmyshimmer in #5511
- fast_generate: unify legacy/new logits kwarg + fix Mistral merge site by @danielhanchen in #5543
- studio/frontend: hide Current password input on first boot by @danielhanchen in #5545
- tests/studio: tighten MLX smoke gates (loss + round-trip, _on_step grad_norm) by @danielhanchen in #5537
- tests + CI: callback signature drift detector by @danielhanchen in #5498
- images: use narrower Discord button and drop duplicate by @danielhanchen in #5552
- fix(studio): handle expired OpenAI shell-tool containers without surfacing error in chat by @rolandtannous in #5547
- studio/chat: release stuck IME flag when compositionend never fires by @wtfashwin in #5551
New Contributors
- @Anai-Guo made their first contribution in #5329
- @uderbashi made their first contribution in #5168
- @Tenith01 made their first contribution in #5356
- @anmolxlight made their first contribution in #5451
- @wtfashwin made their first contribution in #5524
- @melroy89 made their first contribution in #5516
- @h34v3nzc0dex made their first contribution in #5517
- @PTFOPlayer made their first contribution in #4611
Full Changelog: v0.1.39-beta...v0.1.40-beta
What's Changed in Unsloth-Zoo
- Register Gemma-4 MoE LoRA extractor to fix grouped_mm contraction crash by @danielhanchen in unslothai/unsloth-zoo#624
- feat(mlx): Apple Silicon training (text + VLM, LoRA / full FT, CCE, export) by @Manan17 in unslothai/unsloth-zoo#620
- tests: skip MoE LoRA extractor coverage when discovery finds zero classes by @danielhanchen in unslothai/unsloth-zoo#628
- tests: pivot MoE-coverage canary to _unsloth_already_patched marker by @danielhanchen in unslothai/unsloth-zoo#630
- fix(compiler): make higher_precision_softmax idempotent by @danielhanchen in unslothai/unsloth-zoo#631
- fix(mlx): unblock GGUF export and LoRA reload on Apple Silicon by @danielhanchen in unslothai/unsloth-zoo#627
- fix(compiler): unblock all model_types across transformers 4.57.6 and 5.x by @danielhanchen in unslothai/unsloth-zoo#632
- Mask for gemma3 attn by @Datta0 in unslothai/unsloth-zoo#635
- Multi Image GRPO by @Datta0 in unslothai/unsloth-zoo#613
- [GRPO] Try returning hidden statex for GRPO by @Datta0 in unslothai/unsloth-zoo#609
- Refactor and consolidate moe lora extractors by @Datta0 in unslothai/unsloth-zoo#629
- security + CI: mirror unsloth's hardening stack onto zoo (greenfield .github/) by @danielhanchen in unslothai/unsloth-zoo#637
- remove unsloth_zoo/import_fixes.py: redundant with unsloth's by @danielhanchen in unslothai/unsloth-zoo#639
- chore: trim verbose comments across PR #637 landing by @danielhanchen in unslothai/unsloth-zoo#640
- scripts: ship deterministic comment / docstring-only diff verifier by @danielhanchen in unslothai/unsloth-zoo#641
- fix mlx: Adds the MLX training path used by Studio on Apple Silicon by @mmathew23 in unslothai/unsloth-zoo#634
- tests: drift detectors cover transformers 5.x (mirror unsloth PR #5423) by @danielhanchen in unslothai/unsloth-zoo#642
- gpt_oss: reorder helpers before patch_gpt_oss_bnb4bit_auto by @danielhanchen in unslothai/unsloth-zoo#643
- init: lazy-load legacy MLX aliases on every host by @danielhanchen in unslothai/unsloth-zoo#644
- tests: contain security-conftest network block; fix stale mlx paths; skip GPU import in trainer-exec-marker by @danielhanchen in unslothai/unsloth-zoo#648
- fix CI fallout from MLX subpackage refactor (#634) by @danielhanchen in unslothai/unsloth-zoo#646
- tests: tolerate transformers 5.x source/signature drift in two zoo drift detectors by @danielhanchen in unslothai/unsloth-zoo#650
- tests: skip _assert_params_superset when upstream forward is (*args, **kwargs) by @danielhanchen in unslothai/unsloth-zoo#651
- mlx: lower max_grad_value default from 5.0 to 1.0 by @danielhanchen in unslothai/unsloth-zoo#652
- saving: layout-aware MoE LoRA merge + loud-fail on fallback (#5410) by @danielhanchen in unslothai/unsloth-zoo#647
- tests: follow MoE merge wrapper delegation in drift detector by @danielhanchen in unslothai/unsloth-zoo#653
- additional import try except handling for mlx by @mmathew23 in unslothai/unsloth-zoo#654
- Patch every LOSS_MAPPING key aliased to ForCausalLMLoss by @danielhanchen in unslothai/unsloth-zoo#656
- deps: bump torch upper cap to <2.13.0 (allow xpu 2.11.0 / 2.12.0) by @danielhanchen in unslothai/unsloth-zoo#658
- Auto-install fused lm_head + cross_entropy forward (opt-in) by @danielhanchen in unslothai/unsloth-zoo#657
- tests: CPU regression detectors for the MoE merge / save path (#5410) by @danielhanchen in unslothai/unsloth-zoo#655
- Fix GGUF converter sibling imports by @alkinun in unslothai/unsloth-zoo#661
- fix embedding matrix size mismatch bug by @CodeMan62 in unslothai/unsloth-zoo#645
- Honor UNSLOTH_RETURN_LOGITS in fused forward by @danielhanchen in unslothai/unsloth-zoo#665
- init: include HF_DATASETS_OFFLINE in the offline env cross-sync by @danielhanchen in unslothai/unsloth-zoo#664
- compiler: single-matmul opt-in for UNSLOTH_RETURN_LOGITS=1 by @danielhanchen in unslothai/unsloth-zoo#666