unslothai/unsloth v0.1.38-beta on GitHub

You can use local LLMs with tools like Claude Code and Codex by connecting them to Unsloth’s API endpoint. This lets you run models like Qwen and Gemma locally, with additional features such as self-healing tool calling, code execution, and web search. Unsloth makes it easy to deploy a fast API inference endpoint that provides:

Self-healing tool calling, which helps reduce broken or malformed tool calls by 50%
Code execution support, allowing Bash and Python execution for more accurate code outputs.
Advanced Web search that visits and actually reads webpages to gather in-depth info.
Automatic inference settings for GGUF models (temp, top-k etc.)

Models loaded in Unsloth (including GGUFs) are exposed as an authenticated API via llama-server. A long API key is generated for security reasons like how OpenAI provides one. Your local models can then be used directly in your preferred AI agent, SDK, or chat client. Unsloth speaks two dialects on the same port:

Anthropic-compatible /v1/messages for Claude Code, OpenClaw, the Anthropic SDK, and any client that expects the Messages API.
OpenAI-compatible /v1/chat/completions and /v1/responses for the OpenAI SDK, OpenCode, Cursor, Continue, Cline, Open WebUI, SillyTavern, and any OpenAI-compatible tool.
Both support streaming, tool calling (OpenAI tools / Anthropic tools), and vision inputs.

New models

We've also got a handful of new models to run including NVIDIA Nemotron 3 Nano Omni, IBM Granite 4.1 and Mistral 3.5 Medium. We helped Mistral solve some issues with implementation in transformers and GGUFs.

Unsloth Updates

Stopped Studio training runs can now resume from checkpoints.
Chat threads now autosave and persist more reliably.
DPO training hangs in multi-process setups were fixed.
VLM GRPO support improved with MROPE updates.
Studio’s stop button now properly stops generation.
Fix chat template disappearing after browser refresh

What's Changed in Unsloth

Studio: use (gguf) context length before max seq length by @G07cha in #5111
chore: fix typo cleanup across tests and backend strings by @luojiyin1987 in #5152
fix: guard resolve_model_class fallback against unresolvable transformers AutoModel entries by @Etherll in #5155
Studio: kill in-flight llama-server before spawning a new one by @danielhanchen in #5171
Studio: stop currency escape from breaking inline LaTeX by @danielhanchen in #5170
Studio: probe AMD GPUs in llama-server VRAM detection by @danielhanchen in #5172
Studio: make stop button actually stop generation by @danielhanchen in #5069
Studio: add github_repo seed reader and GitHub Support Bot recipe by @danielhanchen in #5169
fix(studio): use endswith for mmproj F16 variant selection by @LeoBorcherding in #5184
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci[bot] in #5204
Fix Windows install when paths contain spaces or Python 3.14 is on PATH by @Etherll in #5201
Studio: Preserve transparency in uploaded profile avatars by @Imagineer99 in #5200
UX: single chat header error placement and selector alignment by @Imagineer99 in #5173
Studio: Refine chat preset and group built-in presets by @Imagineer99 in #5159
Studio: Fix image-only chat requests failing validation by @Imagineer99 in #5212
Studio: fix 7 failing studio_unit_tests on main by @danielhanchen in #5216
Patch checkpoint reload init functions to strip unsupported args by @Datta0 in #5167
Studio: Fix clipped model selector text descenders by @Imagineer99 in #5210
Fix DPO trainer multi process hang by @Datta0 in #5199
Studio: Pin assistant-ui core for fresh installs by @Imagineer99 in #5229
Fix local model scanner to handle ollama cloud models by @Anish9901 in #5220
Fix Studio desktop tray installer and titlebar and bux fixes by @wasimysaid in #5179
MROPE for VLM GRPO by @Datta0 in #5198
install: overlay unsloth-zoo from git main on --local by @rolandtannous in #5242
Studio: Fix chat template disappearing after browser refresh by @Imagineer99 in #5209
studio: add --local to setup.sh + overlay unsloth-zoo from git main by @rolandtannous in #5252
Fix/windowsprebuilt by @mmathew23 in #5241
Studio: Add dataset upload dropzone and update preserve think copy by @Imagineer99 in #5253
Add Qwen3.6 support by @rolandtannous in #5257
Studio: Chat thread autosave persistence by @Imagineer99 in #5256
Studio: Enable deleting fine-tuned chat models by @Imagineer99 in #5234
Studio: Add checkpoint resume for stopped training runs by @Imagineer99 in #5255
Studio: Polish spacing and profile input radius by @Imagineer99 in #5222
Fix check for libcurl headers in install.sh by @LFd3v in #5251
Default Studio host to 127.0.0.1 and prompt before auto-start by @rolandtannous in #5267
Studio: forward llama-server args from unsloth studio run , activate unsloth run , and allow passing model:quant to load models by @rolandtannous in #5271
Studio: Always show API usage examples and docs links by @Imagineer99 in #5270
Studio: Change API Keys settings to API Access by @Imagineer99 in #5268
unsloth run: add --enable-tools/--disable-tools server-side tool policy by @rolandtannous in #5277
fix: use % 8 instead of // 8 in FP8 weight shape check by @Ricardo-M-L in #5243
Pin Studio GGUF export to llama.cpp's local convert script by @mmathew23 in #5275
fix KVCache estimates for gemma4 style sliding window models by @Datta0 in #5225
Update VRAM estimator to cater to broader model configs by @Datta0 in #5175
Fix FastSentenceTransformer loading with newer sentence-transformers by @Etherll in #5259
Studio: Preserve chat history during autosave by @Imagineer99 in #5278

What's changed in Unsloth-Zoo

Fix fused CE grad scaling under DDP by @danielhanchen in unslothai/unsloth-zoo#434
Fused CE backward: guard scaling=0, drop tensor path, use out-of-place mul by @mmathew23 in unslothai/unsloth-zoo#610
Fix/gemma4moefix by @mmathew23 in unslothai/unsloth-zoo#612
MROPE for VLM GRPO by @Datta0 in unslothai/unsloth-zoo#614
Double-buffer GPU activations for overlapping H2D copy with backward compute by @ruixiang63 in unslothai/unsloth-zoo#534
fix(temporary_patches/utils): add missing comma in all (raise_error / Unpack) by @Anai-Guo in unslothai/unsloth-zoo#617
Fix qwen lora extractor for diff peft versions by @Datta0 in unslothai/unsloth-zoo#618
fix: use backend device type in GGUF merge path by @andomeder in unslothai/unsloth-zoo#615
Add unsloth_compiled_cache to gitignore by @Datta0 in unslothai/unsloth-zoo#622
Allow local convert_hf_to_gguf.py via UNSLOTH_LLAMA_CPP_SCRIPTS_DIR by @mmathew23 in unslothai/unsloth-zoo#621

Full Changelog: v0.1.37-beta...v0.1.38-beta

unslothai/unsloth v0.1.38-beta New Unsloth API Inference Endpoint on GitHub

New models

Unsloth Updates

What's Changed in Unsloth

What's changed in Unsloth-Zoo

unslothai/unsloth v0.1.38-beta
New Unsloth API Inference Endpoint

on GitHub