GLM 5.2 GGUFs are now supported in Unsloth Studio! All reasoning levels supported. 3x longer context lengths are now achievable with our new auto fit algorithm with MTP, allowing longer chats. Bypass permissions mode, forkable chats, queue-able chats, a new hub for model discovery, parallel modules + HTTPS Cloudflare support and more! Use unsloth studio --secure for secure HTTPS global access!
To update Unsloth or install a new Unsloth Studio, you must use the below.
Ensure your version is 2026.6.8 or v0.1.47-beta for the latest.
MacOS, Linux, WSL:
curl -fsSL https://unsloth.ai/install.sh | sh
Windows:
irm https://unsloth.ai/install.ps1 | iex
Better context length algorithm
As per #6312 and #6447, we made Unsloth Studio's determination of memory usage and context length much better, achieving 3x longer context overall:
| scenario | KV | before | after |
|---|---|---|---|
| 1x 32GB pipeline (~31 GB free) | f16 | 23,040 | 64,000 |
| q8_0 | 43,520 | 114,944 | |
| q4_0 | 82,432 | 199,680 | |
| 2x 32GB pipeline | any | 262,144 | 262,144 |
| 2x 24GB tensor (~23 GB free) | f16 | 134,049 | 262,144 |
| q8_0 | 252,329 | 262,144 |
Chat Canvas, Forking & Queueing
- Edit assistant messages in place and re-run from any point in the thread.
- Fork a thread to branch a conversation without losing the original.
- Temporary (incognito) chats that leave nothing behind.
- Queue new prompts while a generation is still running instead of waiting.
- Chat "artifacts" are now canvas, with inline HTML canvas cards that auto-render, a Code view, and DiffusionGemma keeps its raw code visible inline instead of collapsing.
- Chat search now covers every message and surfaces your own messages first.
Hub (Redesigned)
- Full-page Hub with a trending feed, search, and custom model paths support.
- README preview in a split-view feed so you can read before you download.
- Downloads default to the faster Xet transport, with automatic HTTP fallback if a transfer stalls.
- New "Load on selection" toggle to set load options before a model loads.
- Google logo shown for DiffusionGemma and future Gemma derivatives.
Models & Inference
- DeepSeek-OCR and more vision models now load and run without errors.
- Fixed fast inference on the latest vLLM (0.22+) so speed-ups work again.
- Tensor parallelism is more reliable: if the faster MTP path fails, it now recovers on its own instead of crashing.
- DiffusionGemma now shows the image forming live as it denoises, with accurate speed stats.
Security & Cloudflare Encrypted Studios
- New
--secureCloudflare-only mode for end-to-end encrypted studios, with server-side tools staying enabled under--secure. Useunsloth studio --secure! - Bypass Permissions mode to skip confirmations and disable the tool sandbox when you want it.
- Auto detect Hugging Face Virus scanning + dangerous files in repos.
Logging and API
- New API server monitor in Studio.
- Faster API calling and less latency
- Much better streamlined logs - now with throughput and latency and removed a lot of bloated logs.
Hardware & Backend
- Better support for Blackwell RTX 50X and 60X GPUs
- Fix silent downgrading to CPU and not GPU
- torchao version is now selected from the installed torch.
- Installer now auto-repairs a broken or CPU-only PyTorch install and warns on silent CPU fallback, across NVIDIA + AMD on Win/Linux/Mac/WSL.
- Frees the chat model's VRAM when training starts, but only when the GPU is actually tight (no needless reloads otherwise).
- If llama-server hard-crashes at startup, Studio now steps through a recovery ladder instead of just failing.
Training & General Fixes & Parallel Modules
- MLX training updates.
- Improved GRPO training reliability with vLLM.
- Training startup made more reliable, with clearer errors for invalid VLM batches.
- Studio now cleans up leftover backend processes more reliably after crashes, restarts, or interrupted shutdowns.
- Export, Chat, Training, Recipes are all individualized / compartmentalized! This means you can do all 4 in parallel now! You can chat / do inference while you wait for a training run or an export!
What's Changed
- Bump install.sh / install.ps1 pin to unsloth>=2026.6.4 by @danielhanchen in #6257
- Studio: account for mmproj VRAM in GGUF fit budget (#5825) by @hoobnn in #5849
- fix(studio): keep local GGUF vision on llama-server by @alkinun in #5770
- install.sh: keep the studio launch from draining the curl | sh script (WSL/dash) by @danielhanchen in #6258
- DiffusionGemma: set UNSLOTH_IS_PRESENT so the shim runs on a clean install by @danielhanchen in #6259
- studio: add keyboard navigation to model picker by @alkinun in #5628
- Bump install.sh / install.ps1 pin to unsloth>=2026.6.5 by @danielhanchen in #6260
- Installer: drop the lemonade ROCm fallback now the fork ships identical per-gfx prebuilts by @oobabooga in #6225
- Studio: keep distinct bpw flavors of the same GGUF quant by @bouclem in #5729
- studio: declare UNSLOTH_IS_PRESENT at backend startup (clean-install + Windows) by @danielhanchen in #6262
- Studio: extend llama.cpp first-token timeout by @Imagineer99 in #5841
- Studio: polish update banner layout and sidebar settings icon by @shimmyshimmer in #6266
- Studio: only advertise a Cloudflare tunnel once it actually serves by @oobabooga in #6264
- Studio: backfill the DiffusionGemma visual-server on a tag-matching update by @danielhanchen in #6267
- Studio: keep llama-server discovery from crashing on an access-denied candidate by @danielhanchen in #6268
- Bump install.sh / install.ps1 pin to unsloth>=2026.6.6 by @danielhanchen in #6270
- Tidy update banner and auth button spacing by @shimmyshimmer in #6279
- Fix llama.cpp prebuilt: skip already-installed same-release fallback by @danielhanchen in #6285
- Installer: drop redundant -WindowStyle Hidden from the Windows launcher VBS by @danielhanchen in #6284
- Fix Responses tool output content arrays by @alkinun in #6287
- Studio: UI polish for sidebar, menus, hub and toasts by @shimmyshimmer in #6288
- Upgrade setuptools and wheel in the auto-install command by @danielhanchen in #6282
- Studio: fix training output dir escaping outputs root for models on another drive by @danielhanchen in #6293
- Studio: offer llama.cpp update for same-base mix builds on source installs by @shimmyshimmer in #6280
- Studio: resolve studio home before the llama-only setup split by @danielhanchen in #6289
- Studio: force-refresh the llama.cpp update check so new builds are not masked by the 24h cache by @shimmyshimmer in #6278
- Studio: decide diffusion routing before the SWA resolver by @danielhanchen in #6299
- Bump install.sh / install.ps1 pin to unsloth>=2026.6.7 by @danielhanchen in #6301
- Studio: don't silently fall back to a CPU prebuilt on NVIDIA Linux GPU hosts by @oobabooga in #6310
- Studio: clarify llama.cpp update banner copy by @danielhanchen in #6313
- MLX Training updates by @mmathew23 in #5656
- Studio: add temporary (incognito) chat by @oobabooga in #5956
- Studio: enable stdio MCP servers on a loopback bind by @oobabooga in #6295
- fix(studio): Windows GGUF cancel hang + CPU spinlock overhead (#5692) by @anmolxlight in #5749
- feat: add Anthropic-compatible thinking parameter by @maattm in #5856
- Studio: Bypass Permissions (skip confirmation, disable tool sandbox) by @danielhanchen in #5895
- Studio: add --secure Cloudflare-only mode and revamp API usage examples by @danielhanchen in #6300
- Studio: arm the VRAM-settle wait after the startup orphan reaper by @danielhanchen in #6315
- Studio: fix Mac IME input-method switch leaving composer Send disabled by @narakai in #5762
- fix: use partial hipinfo output on crash to avoid CPU fallback (RDNA 4 / gfx1200) by @mvanhorn in #6292
- Rename chat artifacts copy to canvas by @wasimysaid in #6298
- studio: surface GGUF chat-template alternation errors clearly by @danielhanchen in #5980
- Fix version parsing NameError and older Python syntax compatibility by @umran666 in #6318
- Studio: loading spinner consolidation and sidebar hover pill polish by @shimmyshimmer in #6334
- Studio: show llama.cpp version and GPU specs in the About panel by @oobabooga in #6261
- Studio: keep training from failing when a namespace-package shadows unsloth by @danielhanchen in #6269
- feat(studio): expose provider_type selector in model provider dialog by @octo-patch in #4277
- Fix the libaray path for probe_server_capabilities() by @huaj1ng in #5797
- feat: implement thread forking functionality with associated database… by @Erildo in #5810
- Studio: rewrite logging middleware as pure ASGI by @wasimysaid in #6337
- Studio: queue chat prompts while generation is running by @Imagineer99 in #6244
- Studio: clearer error when adding a stdio (local command) MCP server by @NilayYadav in #6341
- Studio: add Studio version info to General settings by @Imagineer99 in #5675
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #6343
- Studio: make lifespan shutdown resilient to a dead default executor by @danielhanchen in #6307
- Studio: warn when a GPU model silently loaded on CPU by @danielhanchen in #6339
- Studio: improve logging for dynamic transformers version switching by @danielhanchen in #6108
- Chat search: search all messages, user messages first by @danielhanchen in #6350
- Studio: omit --threads when unset so llama.cpp picks physical cores by @danielhanchen in #5894
- Package scanners: cut false positives and make the CI gate blocking by @danielhanchen in #6355
- CLI: fix --local-dataset being parsed as a string instead of a list by @NilayYadav in #6357
- Fix GGUF variant file selection by @wasimysaid in #6342
- ci(studio): fix Linux tool-calling flake (Q4_K_XL) and capture server logs on 500 by @danielhanchen in #6360
- Make the Studio installers (sh + ps1) resilient to transient uv download failures by @danielhanchen in #6281
- Studio: remove the Windows VBS launcher to clear the Kaspersky false positive by @danielhanchen in #6326
- Harden Trainer._load_rng_state against malicious checkpoints (CVE-2026-1839) by @danielhanchen in #6351
- Studio: hide the llama-server install validation probe from model pickers by @danielhanchen in #6366
- Studio: Xet-primary model downloads with automatic HTTP fallback on stall by @danielhanchen in #6372
- Add API server monitor in Studio by @alkinun in #5558
- Studio: HTML canvas cards in chat with auto-render, a Code view, and visible diffusion code by @danielhanchen in #6374
- fix(unsloth-cli): route hub_path/hub_token correctly in --push_model save block by @Anai-Guo in #6346
- studio: deterministic VRAM auto-fit for GGUF (MTP reserve, compute buffer, total-based budget) by @danielhanchen in #6312
- Studio: surface the real reason a model fails validation by @danielhanchen in #6398
- studio: select torchao version from the installed torch by @danielhanchen in #6400
- feat(hub): full-page redesign with trending feed, search, and persisted state by @Sneakr in #6349
- Studio: trim serving-log noise and surface llama-server engine stats by @danielhanchen in #6377
- Studio Hub: preview first README in split view and round status chips into pills by @shimmyshimmer in #6404
- Studio: add 'Load on selection' toggle to configure load options before loading by @oobabooga in #6348
- Feature: implement editable assistant messages by @CelesteHeartsong in #6397
- Studio: follow OS theme by default and apply it on mount by @Etherll in #6405
- CLI: add
unsloth connectto point coding agents at a local Studio server by @NilayYadav in #6407 - Shim removed vllm.transformers_utils.tokenizer so fast_inference works on vLLM >= 0.22 by @GodlyDonuts in #6390
- Studio: stop the llama.cpp update banner flickering and show the download size by @danielhanchen in #6338
- studio: fix tests turning main CI red/flaky (kill-process, install overrides, UI re-login) by @danielhanchen in #6419
- Studio: prefer native cuda13 over torch's cuda12 line on Blackwell Linux hosts by @danielhanchen in #6379
- Studio: convert SecurityHeadersMiddleware to pure ASGI by @danielhanchen in #6394
- Studio: serialize non-streaming responses once and pool the proxy client by @danielhanchen in #6393
- Studio: pin CUDA_DEVICE_ORDER=PCI_BUS_ID and list GPUs at startup by @LeoBorcherding in #6353
- test: regression guards for SecurityHeadersMiddleware pure-ASGI by @danielhanchen in #6424
- Studio: make toast text selectable by @shimmyshimmer in #6423
- Fix _kill_process AttributeError when _stats_logger is unset by @danielhanchen in #6417
- Reduce and tighten comments and docstrings across the test suite by @danielhanchen in #6429
- Studio: harden GPU startup detection and clarify multi-GPU listing by @danielhanchen in #6427
- Studio: make sidebar nav hover a fully rounded pill by @shimmyshimmer in #6435
- Studio Hub: show Google logo for diffusiongemma and future gemma derivatives by @shimmyshimmer in #6432
- Studio Hub: default downloads to Xet transport by @shimmyshimmer in #6433
- Harden model fetching by @danielhanchen in #6391
- Runtime MTP fallback for tensor parallelism (try MTP, recover if it crashes) by @danielhanchen in #6324
- Studio: scale export GGUF size estimates from the real model size by @danielhanchen in #6418
- Reap Studio child processes when the parent dies abnormally by @danielhanchen in #6425
- Studio: cross-session backstop to reap a leftover llama-server on startup by @danielhanchen in #6431
- Keep server-side tools enabled under --secure by @danielhanchen in #6403
- Studio: reach the published source asset when a mix build's commit 404s by @danielhanchen in #6314
- Studio: show an actionable message when the GGUF runtime is missing by @danielhanchen in #6327
- Fix: scan_packages.py --fix crash on download_packages() tuple return by @parveshsaini in #6413
- Align GRPO vllm_enable_sleep_mode with the engine's actual sleep state by @danielhanchen in #6420
- Studio: fail fast on an invalid first training batch (base VLM empty chat template) by @danielhanchen in #6358
- Package scanners: close fail-open gaps in the sdist fallback and hidden-payload paths by @danielhanchen in #6359
- Load DeepSeek-OCR and other VLMs that register AutoModel in auto_map by @danielhanchen in #6421
- Pin unsloth-zoo>=2026.6.5 in install.sh / install.ps1 by @danielhanchen in #6440
- Studio: square off the sidebar profile button hover so it isn't a pill by @shimmyshimmer in #6443
- Studio: polish Bypass permissions toggle and add it to chat menu settings by @shimmyshimmer in #6442
- Installer: repair stale/CPU-only PyTorch and warn on silent CPU fallback (NVIDIA + AMD, Win/Linux/Mac/WSL) by @danielhanchen in #5942
- Studio: free chat model VRAM at training start only when the GPU is tight by @danielhanchen in #6243
- Studio: graceful recovery ladder when llama-server hard-crashes at startup by @danielhanchen in #6291
- Studio: restore sidebar roundness to its pre-#6349 state by @shimmyshimmer in #6446
- Studio: Bypass Permissions menu fix, decimal GB sizes, and GLM-5.2 high/max/disabled thinking by @danielhanchen in #6444
- Studio: reserve the duplicated MTP target KV context for MLA models (GLM-5.2 OOM) by @danielhanchen in #6447
- diffusion_studio: split shim thought channels into reasoning_content by @raydeStar in unslothai/unsloth-zoo#769
- Use packaging.version to compare numpy versions by @ccoulombe in unslothai/unsloth-zoo#753
- MLX Update Training by @mmathew23 in unslothai/unsloth-zoo#684
- Guard mlx import in test_qwen35_vjp_metal so Linux collection skips cleanly by @danielhanchen in unslothai/unsloth-zoo#775
- fix(mlx): wrong gated-delta grads for batch rows past the first by @Lyxot in unslothai/unsloth-zoo#776
- test(mlx): batched gradient-parity regression for gated-delta VJP by @danielhanchen in unslothai/unsloth-zoo#778
- fix(full-FT): upcast LayerNorm weights to fp32 for bf16 training by @mmathew23 in unslothai/unsloth-zoo#680
- Tighten comments in bf16 full-FT norm upcast code by @danielhanchen in unslothai/unsloth-zoo#780
- fix(diffusion_studio): add torch/lib to PATH on Windows so visual-server loads CUDA backend by @Anai-Guo in unslothai/unsloth-zoo#770
- fix(diffusion): load bundled CUDA runtime + auto-size canvas for DiffusionGemma on Linux/WSL2 by @ThrownLemon in unslothai/unsloth-zoo#771
- fix(tests): import unsloth_zoo (not unsloth) in fresh-interpreter pickle tests by @danielhanchen in unslothai/unsloth-zoo#781
- Fix harmony analysis channel using thinking instead of content by @mvanhorn in unslothai/unsloth-zoo#772
- Fix Gemma-4 26B-A4B MoE LoRA merge dropping expert deltas by @danielhanchen in unslothai/unsloth-zoo#779
- Fix fast_inference crash on vLLM >= 0.22 (removed transformers_utils.tokenizer) by @GodlyDonuts in unslothai/unsloth-zoo#783
- Optimize LoRA merge to 16bit save and add end-to-end merge correctness suite by @danielhanchen in unslothai/unsloth-zoo#777
- fix(mlx): enforce save_total_limit checkpoint rotation by @Lyxot in unslothai/unsloth-zoo#782
- fix(mlx): cast Gemma3 language MLP activation to fp32 to prevent backward NaN by @BardiaKoopah in unslothai/unsloth-zoo#785
- fix(mlx): keep Qwen3-VL vision MLP fp32 when activation dtype is fp16 by @BardiaKoopah in unslothai/unsloth-zoo#787
- Fix: handle key format mismatch for VLM models (Ministral-3-14B) by @anmolxlight in unslothai/unsloth-zoo#773
- Fix LoRA fast_inference on new vLLM: get_dummy_lora_warmup_rank + compile downgrade by @danielhanchen in unslothai/unsloth-zoo#788
- test(mlx): verify VLM resume_from_checkpoint matches fresh run step-for-step by @BardiaKoopah in unslothai/unsloth-zoo#786
- fix(mlx): align response-mask batching with CUDA by @Lyxot in unslothai/unsloth-zoo#784
- Fix vision GRPO under vLLM standby, bundle converter resolution, DeepseekV2MoE rename by @danielhanchen in unslothai/unsloth-zoo#768
New Contributors
- @hoobnn made their first contribution in #5849
- @bouclem made their first contribution in #5729
- @maattm made their first contribution in #5856
- @narakai made their first contribution in #5762
- @umran666 made their first contribution in #6318
- @huaj1ng made their first contribution in #5797
- @Erildo made their first contribution in #5810
- @CelesteHeartsong made their first contribution in #6397
- @GodlyDonuts made their first contribution in #6390
- @parveshsaini made their first contribution in #6413
Full Changelog: v0.1.46-beta...v0.1.47-beta