unslothai/unsloth v0.1.47-beta on GitHub

GLM 5.2 GGUFs are now supported in Unsloth Studio! All reasoning levels supported. 3x longer context lengths are now achievable with our new auto fit algorithm with MTP, allowing longer chats. Bypass permissions mode, forkable chats, queue-able chats, a new hub for model discovery, parallel modules + HTTPS Cloudflare support and more! Use unsloth studio --secure for secure HTTPS global access!

Screenshot 2026-06-18 at 10-35-59 Chat - Unsloth Studio

To update Unsloth or install a new Unsloth Studio, you must use the below.
Ensure your version is 2026.6.8 or v0.1.47-beta for the latest.

MacOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Better context length algorithm

As per #6312 and #6447, we made Unsloth Studio's determination of memory usage and context length much better, achieving 3x longer context overall:

scenario	KV	before	after
1x 32GB pipeline (~31 GB free)	f16	23,040	64,000
	q8_0	43,520	114,944
	q4_0	82,432	199,680
2x 32GB pipeline	any	262,144	262,144
2x 24GB tensor (~23 GB free)	f16	134,049	262,144
	q8_0	252,329	262,144

Chat Canvas, Forking & Queueing

Edit assistant messages in place and re-run from any point in the thread.
Fork a thread to branch a conversation without losing the original.
Temporary (incognito) chats that leave nothing behind.
Queue new prompts while a generation is still running instead of waiting.
Chat "artifacts" are now canvas, with inline HTML canvas cards that auto-render, a Code view, and DiffusionGemma keeps its raw code visible inline instead of collapsing.
Chat search now covers every message and surfaces your own messages first.

Hub (Redesigned)

Full-page Hub with a trending feed, search, and custom model paths support.
README preview in a split-view feed so you can read before you download.
Downloads default to the faster Xet transport, with automatic HTTP fallback if a transfer stalls.
New "Load on selection" toggle to set load options before a model loads.
Google logo shown for DiffusionGemma and future Gemma derivatives.

Models & Inference

DeepSeek-OCR and more vision models now load and run without errors.
Fixed fast inference on the latest vLLM (0.22+) so speed-ups work again.
Tensor parallelism is more reliable: if the faster MTP path fails, it now recovers on its own instead of crashing.
DiffusionGemma now shows the image forming live as it denoises, with accurate speed stats.

Security & Cloudflare Encrypted Studios

New --secure Cloudflare-only mode for end-to-end encrypted studios, with server-side tools staying enabled under --secure. Use unsloth studio --secure!
Bypass Permissions mode to skip confirmations and disable the tool sandbox when you want it.
Auto detect Hugging Face Virus scanning + dangerous files in repos.

Logging and API

New API server monitor in Studio.
Faster API calling and less latency
Much better streamlined logs - now with throughput and latency and removed a lot of bloated logs.

Hardware & Backend

Better support for Blackwell RTX 50X and 60X GPUs
Fix silent downgrading to CPU and not GPU
torchao version is now selected from the installed torch.
Installer now auto-repairs a broken or CPU-only PyTorch install and warns on silent CPU fallback, across NVIDIA + AMD on Win/Linux/Mac/WSL.
Frees the chat model's VRAM when training starts, but only when the GPU is actually tight (no needless reloads otherwise).
If llama-server hard-crashes at startup, Studio now steps through a recovery ladder instead of just failing.

Training & General Fixes & Parallel Modules

MLX training updates.
Improved GRPO training reliability with vLLM.
Training startup made more reliable, with clearer errors for invalid VLM batches.
Studio now cleans up leftover backend processes more reliably after crashes, restarts, or interrupted shutdowns.
Export, Chat, Training, Recipes are all individualized / compartmentalized! This means you can do all 4 in parallel now! You can chat / do inference while you wait for a training run or an export!

What's Changed

Bump install.sh / install.ps1 pin to unsloth>=2026.6.4 by @danielhanchen in #6257
Studio: account for mmproj VRAM in GGUF fit budget (#5825) by @hoobnn in #5849
fix(studio): keep local GGUF vision on llama-server by @alkinun in #5770
install.sh: keep the studio launch from draining the curl | sh script (WSL/dash) by @danielhanchen in #6258
DiffusionGemma: set UNSLOTH_IS_PRESENT so the shim runs on a clean install by @danielhanchen in #6259
studio: add keyboard navigation to model picker by @alkinun in #5628
Bump install.sh / install.ps1 pin to unsloth>=2026.6.5 by @danielhanchen in #6260
Installer: drop the lemonade ROCm fallback now the fork ships identical per-gfx prebuilts by @oobabooga in #6225
Studio: keep distinct bpw flavors of the same GGUF quant by @bouclem in #5729
studio: declare UNSLOTH_IS_PRESENT at backend startup (clean-install + Windows) by @danielhanchen in #6262
Studio: extend llama.cpp first-token timeout by @Imagineer99 in #5841
Studio: polish update banner layout and sidebar settings icon by @shimmyshimmer in #6266
Studio: only advertise a Cloudflare tunnel once it actually serves by @oobabooga in #6264
Studio: backfill the DiffusionGemma visual-server on a tag-matching update by @danielhanchen in #6267
Studio: keep llama-server discovery from crashing on an access-denied candidate by @danielhanchen in #6268
Bump install.sh / install.ps1 pin to unsloth>=2026.6.6 by @danielhanchen in #6270
Tidy update banner and auth button spacing by @shimmyshimmer in #6279
Fix llama.cpp prebuilt: skip already-installed same-release fallback by @danielhanchen in #6285
Installer: drop redundant -WindowStyle Hidden from the Windows launcher VBS by @danielhanchen in #6284
Fix Responses tool output content arrays by @alkinun in #6287
Studio: UI polish for sidebar, menus, hub and toasts by @shimmyshimmer in #6288
Upgrade setuptools and wheel in the auto-install command by @danielhanchen in #6282
Studio: fix training output dir escaping outputs root for models on another drive by @danielhanchen in #6293
Studio: offer llama.cpp update for same-base mix builds on source installs by @shimmyshimmer in #6280
Studio: resolve studio home before the llama-only setup split by @danielhanchen in #6289
Studio: force-refresh the llama.cpp update check so new builds are not masked by the 24h cache by @shimmyshimmer in #6278
Studio: decide diffusion routing before the SWA resolver by @danielhanchen in #6299
Bump install.sh / install.ps1 pin to unsloth>=2026.6.7 by @danielhanchen in #6301
Studio: don't silently fall back to a CPU prebuilt on NVIDIA Linux GPU hosts by @oobabooga in #6310
Studio: clarify llama.cpp update banner copy by @danielhanchen in #6313
MLX Training updates by @mmathew23 in #5656
Studio: add temporary (incognito) chat by @oobabooga in #5956
Studio: enable stdio MCP servers on a loopback bind by @oobabooga in #6295
fix(studio): Windows GGUF cancel hang + CPU spinlock overhead (#5692) by @anmolxlight in #5749
feat: add Anthropic-compatible thinking parameter by @maattm in #5856
Studio: Bypass Permissions (skip confirmation, disable tool sandbox) by @danielhanchen in #5895
Studio: add --secure Cloudflare-only mode and revamp API usage examples by @danielhanchen in #6300
Studio: arm the VRAM-settle wait after the startup orphan reaper by @danielhanchen in #6315
Studio: fix Mac IME input-method switch leaving composer Send disabled by @narakai in #5762
fix: use partial hipinfo output on crash to avoid CPU fallback (RDNA 4 / gfx1200) by @mvanhorn in #6292
Rename chat artifacts copy to canvas by @wasimysaid in #6298
studio: surface GGUF chat-template alternation errors clearly by @danielhanchen in #5980
Fix version parsing NameError and older Python syntax compatibility by @umran666 in #6318
Studio: loading spinner consolidation and sidebar hover pill polish by @shimmyshimmer in #6334
Studio: show llama.cpp version and GPU specs in the About panel by @oobabooga in #6261
Studio: keep training from failing when a namespace-package shadows unsloth by @danielhanchen in #6269
feat(studio): expose provider_type selector in model provider dialog by @octo-patch in #4277
Fix the libaray path for probe_server_capabilities() by @huaj1ng in #5797
feat: implement thread forking functionality with associated database… by @Erildo in #5810
Studio: rewrite logging middleware as pure ASGI by @wasimysaid in #6337
Studio: queue chat prompts while generation is running by @Imagineer99 in #6244
Studio: clearer error when adding a stdio (local command) MCP server by @NilayYadav in #6341
Studio: add Studio version info to General settings by @Imagineer99 in #5675
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #6343
Studio: make lifespan shutdown resilient to a dead default executor by @danielhanchen in #6307
Studio: warn when a GPU model silently loaded on CPU by @danielhanchen in #6339
Studio: improve logging for dynamic transformers version switching by @danielhanchen in #6108
Chat search: search all messages, user messages first by @danielhanchen in #6350
Studio: omit --threads when unset so llama.cpp picks physical cores by @danielhanchen in #5894
Package scanners: cut false positives and make the CI gate blocking by @danielhanchen in #6355
CLI: fix --local-dataset being parsed as a string instead of a list by @NilayYadav in #6357
Fix GGUF variant file selection by @wasimysaid in #6342
ci(studio): fix Linux tool-calling flake (Q4_K_XL) and capture server logs on 500 by @danielhanchen in #6360
Make the Studio installers (sh + ps1) resilient to transient uv download failures by @danielhanchen in #6281
Studio: remove the Windows VBS launcher to clear the Kaspersky false positive by @danielhanchen in #6326
Harden Trainer._load_rng_state against malicious checkpoints (CVE-2026-1839) by @danielhanchen in #6351
Studio: hide the llama-server install validation probe from model pickers by @danielhanchen in #6366
Studio: Xet-primary model downloads with automatic HTTP fallback on stall by @danielhanchen in #6372
Add API server monitor in Studio by @alkinun in #5558
Studio: HTML canvas cards in chat with auto-render, a Code view, and visible diffusion code by @danielhanchen in #6374
fix(unsloth-cli): route hub_path/hub_token correctly in --push_model save block by @Anai-Guo in #6346
studio: deterministic VRAM auto-fit for GGUF (MTP reserve, compute buffer, total-based budget) by @danielhanchen in #6312
Studio: surface the real reason a model fails validation by @danielhanchen in #6398
studio: select torchao version from the installed torch by @danielhanchen in #6400
feat(hub): full-page redesign with trending feed, search, and persisted state by @Sneakr in #6349
Studio: trim serving-log noise and surface llama-server engine stats by @danielhanchen in #6377
Studio Hub: preview first README in split view and round status chips into pills by @shimmyshimmer in #6404
Studio: add 'Load on selection' toggle to configure load options before loading by @oobabooga in #6348
Feature: implement editable assistant messages by @CelesteHeartsong in #6397
Studio: follow OS theme by default and apply it on mount by @Etherll in #6405
CLI: add unsloth connect to point coding agents at a local Studio server by @NilayYadav in #6407
Shim removed vllm.transformers_utils.tokenizer so fast_inference works on vLLM >= 0.22 by @GodlyDonuts in #6390
Studio: stop the llama.cpp update banner flickering and show the download size by @danielhanchen in #6338
studio: fix tests turning main CI red/flaky (kill-process, install overrides, UI re-login) by @danielhanchen in #6419
Studio: prefer native cuda13 over torch's cuda12 line on Blackwell Linux hosts by @danielhanchen in #6379
Studio: convert SecurityHeadersMiddleware to pure ASGI by @danielhanchen in #6394
Studio: serialize non-streaming responses once and pool the proxy client by @danielhanchen in #6393
Studio: pin CUDA_DEVICE_ORDER=PCI_BUS_ID and list GPUs at startup by @LeoBorcherding in #6353
test: regression guards for SecurityHeadersMiddleware pure-ASGI by @danielhanchen in #6424
Studio: make toast text selectable by @shimmyshimmer in #6423
Fix _kill_process AttributeError when _stats_logger is unset by @danielhanchen in #6417
Reduce and tighten comments and docstrings across the test suite by @danielhanchen in #6429
Studio: harden GPU startup detection and clarify multi-GPU listing by @danielhanchen in #6427
Studio: make sidebar nav hover a fully rounded pill by @shimmyshimmer in #6435
Studio Hub: show Google logo for diffusiongemma and future gemma derivatives by @shimmyshimmer in #6432
Studio Hub: default downloads to Xet transport by @shimmyshimmer in #6433
Harden model fetching by @danielhanchen in #6391
Runtime MTP fallback for tensor parallelism (try MTP, recover if it crashes) by @danielhanchen in #6324
Studio: scale export GGUF size estimates from the real model size by @danielhanchen in #6418
Reap Studio child processes when the parent dies abnormally by @danielhanchen in #6425
Studio: cross-session backstop to reap a leftover llama-server on startup by @danielhanchen in #6431
Keep server-side tools enabled under --secure by @danielhanchen in #6403
Studio: reach the published source asset when a mix build's commit 404s by @danielhanchen in #6314
Studio: show an actionable message when the GGUF runtime is missing by @danielhanchen in #6327
Fix: scan_packages.py --fix crash on download_packages() tuple return by @parveshsaini in #6413
Align GRPO vllm_enable_sleep_mode with the engine's actual sleep state by @danielhanchen in #6420
Studio: fail fast on an invalid first training batch (base VLM empty chat template) by @danielhanchen in #6358
Package scanners: close fail-open gaps in the sdist fallback and hidden-payload paths by @danielhanchen in #6359
Load DeepSeek-OCR and other VLMs that register AutoModel in auto_map by @danielhanchen in #6421
Pin unsloth-zoo>=2026.6.5 in install.sh / install.ps1 by @danielhanchen in #6440
Studio: square off the sidebar profile button hover so it isn't a pill by @shimmyshimmer in #6443
Studio: polish Bypass permissions toggle and add it to chat menu settings by @shimmyshimmer in #6442
Installer: repair stale/CPU-only PyTorch and warn on silent CPU fallback (NVIDIA + AMD, Win/Linux/Mac/WSL) by @danielhanchen in #5942
Studio: free chat model VRAM at training start only when the GPU is tight by @danielhanchen in #6243
Studio: graceful recovery ladder when llama-server hard-crashes at startup by @danielhanchen in #6291
Studio: restore sidebar roundness to its pre-#6349 state by @shimmyshimmer in #6446
Studio: Bypass Permissions menu fix, decimal GB sizes, and GLM-5.2 high/max/disabled thinking by @danielhanchen in #6444
Studio: reserve the duplicated MTP target KV context for MLA models (GLM-5.2 OOM) by @danielhanchen in #6447
diffusion_studio: split shim thought channels into reasoning_content by @raydeStar in unslothai/unsloth-zoo#769
Use packaging.version to compare numpy versions by @ccoulombe in unslothai/unsloth-zoo#753
MLX Update Training by @mmathew23 in unslothai/unsloth-zoo#684
Guard mlx import in test_qwen35_vjp_metal so Linux collection skips cleanly by @danielhanchen in unslothai/unsloth-zoo#775
fix(mlx): wrong gated-delta grads for batch rows past the first by @Lyxot in unslothai/unsloth-zoo#776
test(mlx): batched gradient-parity regression for gated-delta VJP by @danielhanchen in unslothai/unsloth-zoo#778
fix(full-FT): upcast LayerNorm weights to fp32 for bf16 training by @mmathew23 in unslothai/unsloth-zoo#680
Tighten comments in bf16 full-FT norm upcast code by @danielhanchen in unslothai/unsloth-zoo#780
fix(diffusion_studio): add torch/lib to PATH on Windows so visual-server loads CUDA backend by @Anai-Guo in unslothai/unsloth-zoo#770
fix(diffusion): load bundled CUDA runtime + auto-size canvas for DiffusionGemma on Linux/WSL2 by @ThrownLemon in unslothai/unsloth-zoo#771
fix(tests): import unsloth_zoo (not unsloth) in fresh-interpreter pickle tests by @danielhanchen in unslothai/unsloth-zoo#781
Fix harmony analysis channel using thinking instead of content by @mvanhorn in unslothai/unsloth-zoo#772
Fix Gemma-4 26B-A4B MoE LoRA merge dropping expert deltas by @danielhanchen in unslothai/unsloth-zoo#779
Fix fast_inference crash on vLLM >= 0.22 (removed transformers_utils.tokenizer) by @GodlyDonuts in unslothai/unsloth-zoo#783
Optimize LoRA merge to 16bit save and add end-to-end merge correctness suite by @danielhanchen in unslothai/unsloth-zoo#777
fix(mlx): enforce save_total_limit checkpoint rotation by @Lyxot in unslothai/unsloth-zoo#782
fix(mlx): cast Gemma3 language MLP activation to fp32 to prevent backward NaN by @BardiaKoopah in unslothai/unsloth-zoo#785
fix(mlx): keep Qwen3-VL vision MLP fp32 when activation dtype is fp16 by @BardiaKoopah in unslothai/unsloth-zoo#787
Fix: handle key format mismatch for VLM models (Ministral-3-14B) by @anmolxlight in unslothai/unsloth-zoo#773
Fix LoRA fast_inference on new vLLM: get_dummy_lora_warmup_rank + compile downgrade by @danielhanchen in unslothai/unsloth-zoo#788
test(mlx): verify VLM resume_from_checkpoint matches fresh run step-for-step by @BardiaKoopah in unslothai/unsloth-zoo#786
fix(mlx): align response-mask batching with CUDA by @Lyxot in unslothai/unsloth-zoo#784
Fix vision GRPO under vLLM standby, bundle converter resolution, DeepseekV2MoE rename by @danielhanchen in unslothai/unsloth-zoo#768

New Contributors

@hoobnn made their first contribution in #5849
@bouclem made their first contribution in #5729
@maattm made their first contribution in #5856
@narakai made their first contribution in #5762
@umran666 made their first contribution in #6318
@huaj1ng made their first contribution in #5797
@Erildo made their first contribution in #5810
@CelesteHeartsong made their first contribution in #6397
@GodlyDonuts made their first contribution in #6390
@parveshsaini made their first contribution in #6413

Full Changelog: v0.1.46-beta...v0.1.47-beta

unslothai/unsloth v0.1.47-beta GLM 5.2, 3x longer contexts on GitHub