github unslothai/unsloth v0.1.405-beta
Qwen3.6 MTP and API / Connections

3 hours ago

We've got lots of new updates. Please use the latest Unsloth v0.1.405-beta, not v0.1.40-beta which is older.

  • ~2x faster GGUF inference with automatically enabled MTP
  • API support for OpenAI, Anthropic etc. with auto prompt caching, web search, code execution
  • Connect to external inference backends: vLLM, Ollama llama-server
  • Experimental MLX inference
  • Proper support for non-English languages
  • Security improvements

MTP speculative decoding support 1.4 to 2x faster inference!

  • Auto MTP speculative decoding for MTP GGUFs; warn when the bundled llama.cpp prebuilt is stale or too old for MTP
  • New pre-built llama.cpp binaries for MTP support!

API provider calling & external connections

  • You can now connect Unsloth to any API cloud provider (OpenAI, Anthropic, OpenRouter etc.)
  • Built-in web search for OpenAI, Anthropic, OpenRouter and Kimi
  • Built-in code execution for OpenAI and Anthropic (Anthropic containers persist and are reused across turns)
  • Prompt caching is enabled for OpenAI and Anthropic models saving 50 to 90% of costs.
  • API key now optional for local providers (llama.cpp / vLLM / Ollama)
  • Auto-load models when adding a cloud provider

MLX inference (Experimental)

  • MLX quants and models now can run locally on your Mac machines!
  • We'll be adding thinking, tools and web search soon!

Other Unsloth Studio updates

  • OpenDocument chat attachments
  • o3 reasoning summary payload
  • Sending/prompting non-English languages (e.g. Japanese, Chinese) now works properly
  • IME composer hardening, RTL dir="auto", long log-line truncation fix
  • Tool reasoning trace rendering in UI
  • Fully offline support: cached GGUF discovery and offline DNS auto-detect for both inference and training
  • Lots of UI/UX polish: dark theme refactor, right sidebar redesign, time-of-day sloth mascot, dismissable copyable toasts, larger chat composer, code-execution config polish, composer action pill styling, narrower Discord button

Training updates

  • Gemma attention mask fixes
  • Multi Image GRPO
  • GRPO hidden-state return experiments
  • New Continued Pretraining (CPT) training method as a first-class option
  • Gemma-4 MoE LoRA extractor registered to fix grouped_mm contraction crash
  • Opt-in fused lm_head + cross-entropy forward, with single-matmul path under UNSLOTH_RETURN_LOGITS=1
  • Pass batch size for eval
  • Eval/training paths now honour HF_DATASETS_OFFLINE alongside HF_HUB_OFFLINE

Unsloth Studio security improvements

  • Authentication rate-limiting, proxy-aware so reverse proxies don't bypass it
  • Sandboxed worker with a tightened blocklist (bash, hf upload, NOFILE)
  • Path containment so workers can't escape their in-flight tmp dirs
  • Strict schema validation across the Studio API
  • Tightened CSP / security headers (only legitimate favicon hosts allowed)
  • Removed the torch.load fallback on training_args.bin so untrusted pickles can never execute on model load
  • Hardened Tauri desktop release flow
  • Frontend auth: singleflight token refresh, current-password input on changes, working logout, shared 422 helper
  • Cancel cleanup now scoped strictly to in-flight tmp dirs so it can never delete user state

Bug fixes and correctness

  • Layout-aware MoE LoRA merge with loud-fail on fallback (no more silent wrong saves)
  • num_logits_to_keep regression fixed on transformers >= 4.52
  • Preserve tokenizer EOS token on merged saves
  • Resume PEFT checkpoints under sentence-transformers >= 5.4
  • Restore Flash > SDPA > Flex attention priority for non-Gemma3 models
  • ORPO text-only tokenization now works with processors
  • Embedding matrix size mismatch fix
  • Vicuna chat template fix
  • fast_generate unifies legacy and new logits kwargs (fixes Mistral merge site)
  • higher_precision_softmax made idempotent
  • Patch every LOSS_MAPPING key aliased to ForCausalLMLoss (covers transformers 5.x)
  • GGUF converter sibling imports fixed
  • UTF-8 encoding added to all text-mode file operations
  • Serialise GGUF reload and inherit unsloth-run extra args
  • Fix /recommended-folders 500 on unreadable model directories under Python 3.12+
  • Cross-family GGUF projector blocked in flat local dirs (no more wrong-vision-tower loads)

Installer and platform reliability

  • Custom install paths via STUDIO_HOME / UNSLOTH_STUDIO_HOME
  • CPU-only Linux x86_64 routed to ggml-org/llama.cpp prebuilts
  • Windows CUDA install fixes: paired cudart bundle and Torch NVIDIA DLL paths added to PATH
  • Skip flash-attn install on Blackwell GPUs (sm_100+)
  • Refresh Intel XPU extras for torch 2.7.1 / 2.9.1 / 2.10 / 2.11.0 / 2.12.0; torch upper cap raised to <2.13.0
  • HIP source builds on Ubuntu 24.04 now inject --gcc-install-dir
  • Linux prebuilt fixes for branch-based llama.cpp releases (mangled symlink repair, top-level dir strip)
  • New uninstallers for Linux, macOS (uninstall.sh) and Windows (uninstall.ps1)
  • Mac desktop shortcut spawning and lifecycle fixed
  • unsloth --version flag
  • Studio web update banner and release version display
  • GPU pinned at 95% headroom, with a warning on silent CPU fallback
  • Auto-install flash-linear-attention and tilelang for Qwen3.5 family

What's Changed in Unsloth

New Contributors

Full Changelog: v0.1.39-beta...v0.1.40-beta

What's Changed in Unsloth-Zoo

Don't miss a new unsloth release

NewReleases is sending notifications on new releases.