github unslothai/unsloth v0.1.47-beta
GLM 5.2, 3x longer contexts

7 hours ago

GLM 5.2 GGUFs are now supported in Unsloth Studio! All reasoning levels supported. 3x longer context lengths are now achievable with our new auto fit algorithm with MTP, allowing longer chats. Bypass permissions mode, forkable chats, queue-able chats, a new hub for model discovery, parallel modules + HTTPS Cloudflare support and more! Use unsloth studio --secure for secure HTTPS global access!

Screenshot 2026-06-18 at 10-35-59 Chat - Unsloth Studio

To update Unsloth or install a new Unsloth Studio, you must use the below.
Ensure your version is 2026.6.8 or v0.1.47-beta for the latest.

MacOS, Linux, WSL:

curl -fsSL https://unsloth.ai/install.sh | sh

Windows:

irm https://unsloth.ai/install.ps1 | iex

Better context length algorithm

As per #6312 and #6447, we made Unsloth Studio's determination of memory usage and context length much better, achieving 3x longer context overall:

scenario KV before after
1x 32GB pipeline (~31 GB free) f16 23,040 64,000
  q8_0 43,520 114,944
  q4_0 82,432 199,680
2x 32GB pipeline any 262,144 262,144
2x 24GB tensor (~23 GB free) f16 134,049 262,144
  q8_0 252,329 262,144

Chat Canvas, Forking & Queueing

  • Edit assistant messages in place and re-run from any point in the thread.
  • Fork a thread to branch a conversation without losing the original.
  • Temporary (incognito) chats that leave nothing behind.
  • Queue new prompts while a generation is still running instead of waiting.
  • Chat "artifacts" are now canvas, with inline HTML canvas cards that auto-render, a Code view, and DiffusionGemma keeps its raw code visible inline instead of collapsing.
  • Chat search now covers every message and surfaces your own messages first.

Hub (Redesigned)

  • Full-page Hub with a trending feed, search, and custom model paths support.
  • README preview in a split-view feed so you can read before you download.
  • Downloads default to the faster Xet transport, with automatic HTTP fallback if a transfer stalls.
  • New "Load on selection" toggle to set load options before a model loads.
  • Google logo shown for DiffusionGemma and future Gemma derivatives.

Models & Inference

  • DeepSeek-OCR and more vision models now load and run without errors.
  • Fixed fast inference on the latest vLLM (0.22+) so speed-ups work again.
  • Tensor parallelism is more reliable: if the faster MTP path fails, it now recovers on its own instead of crashing.
  • DiffusionGemma now shows the image forming live as it denoises, with accurate speed stats.

Security & Cloudflare Encrypted Studios

  • New --secure Cloudflare-only mode for end-to-end encrypted studios, with server-side tools staying enabled under --secure. Use unsloth studio --secure!
  • Bypass Permissions mode to skip confirmations and disable the tool sandbox when you want it.
  • Auto detect Hugging Face Virus scanning + dangerous files in repos.

Logging and API

  • New API server monitor in Studio.
  • Faster API calling and less latency
  • Much better streamlined logs - now with throughput and latency and removed a lot of bloated logs.

Hardware & Backend

  • Better support for Blackwell RTX 50X and 60X GPUs
  • Fix silent downgrading to CPU and not GPU
  • torchao version is now selected from the installed torch.
  • Installer now auto-repairs a broken or CPU-only PyTorch install and warns on silent CPU fallback, across NVIDIA + AMD on Win/Linux/Mac/WSL.
  • Frees the chat model's VRAM when training starts, but only when the GPU is actually tight (no needless reloads otherwise).
  • If llama-server hard-crashes at startup, Studio now steps through a recovery ladder instead of just failing.

Training & General Fixes & Parallel Modules

  • MLX training updates.
  • Improved GRPO training reliability with vLLM.
  • Training startup made more reliable, with clearer errors for invalid VLM batches.
  • Studio now cleans up leftover backend processes more reliably after crashes, restarts, or interrupted shutdowns.
  • Export, Chat, Training, Recipes are all individualized / compartmentalized! This means you can do all 4 in parallel now! You can chat / do inference while you wait for a training run or an export!

What's Changed

New Contributors

Full Changelog: v0.1.46-beta...v0.1.47-beta

Don't miss a new unsloth release

NewReleases is sending notifications on new releases.