github unslothai/unsloth v0.1.38-beta
New Unsloth API Inference Endpoint

6 hours ago

You can use local LLMs with tools like Claude Code and Codex by connecting them to Unsloth’s API endpoint. This lets you run models like Qwen and Gemma locally, with additional features such as self-healing tool calling, code execution, and web search. Unsloth makes it easy to deploy a fast API inference endpoint that provides:

Models loaded in Unsloth (including GGUFs) are exposed as an authenticated API via llama-server. A long API key is generated for security reasons like how OpenAI provides one. Your local models can then be used directly in your preferred AI agent, SDK, or chat client. Unsloth speaks two dialects on the same port:

  • Anthropic-compatible /v1/messages for Claude Code, OpenClaw, the Anthropic SDK, and any client that expects the Messages API.
  • OpenAI-compatible /v1/chat/completions and /v1/responses for the OpenAI SDK, OpenCode, Cursor, Continue, Cline, Open WebUI, SillyTavern, and any OpenAI-compatible tool.
  • Both support streaming, tool calling (OpenAI tools / Anthropic tools), and vision inputs.

New models

We've also got a handful of new models to run including NVIDIA Nemotron 3 Nano Omni, IBM Granite 4.1 and Mistral 3.5 Medium. We helped Mistral solve some issues with implementation in transformers and GGUFs.

Unsloth Updates

  • Stopped Studio training runs can now resume from checkpoints.
  • Chat threads now autosave and persist more reliably.
  • DPO training hangs in multi-process setups were fixed.
  • VLM GRPO support improved with MROPE updates.
  • Studio’s stop button now properly stops generation.
  • Fix chat template disappearing after browser refresh

What's Changed in Unsloth

What's changed in Unsloth-Zoo

Full Changelog: v0.1.37-beta...v0.1.38-beta

Don't miss a new unsloth release

NewReleases is sending notifications on new releases.