github oobabooga/textgen v4.6

one hour ago

Changes

  • Tool call confirmation: Add inline approve/reject/always-approve buttons that appear before each tool call is executed. Enable via the new "Confirm tool calls" checkbox in the Chat tab.
  • Stdio MCP server support: In addition to HTTP MCP servers, you can now configure local subprocess-based MCP servers via user_data/mcp.json, using the same format as Claude Desktop and Cursor. [Tutorial]
  • preserve_thinking chat template parameter: New UI checkbox and --preserve-thinking CLI flag to control whether thinking blocks from prior turns are kept in the context.
  • UI: Sidebars overhaul: Sidebars now toggle independently and persist their state on page refresh. Default visibility adapts to viewport width.
  • llama.cpp: Pass --draft-min 48 by default for draftless speculative decoding.
  • Only show the "Reasoning effort" and "Enable thinking" controls for models whose chat template actually uses them.
  • Cache MCP tool discovery to avoid re-querying servers on each generation.
  • Add model download branch handling in download_model_wrapper (#7506). Thanks, @Th-Underscore.
  • UI: Improve border colors in light theme, fix code block copy button colors and centering, fix code block scrollbar flash during page load, improve past chats menu spacing.

Security

  • Fix SSRF vulnerabilities in URL fetching: add backslash and userinfo rejection, validate every redirect hop.

Bug fixes

  • Fix Gemma 4 thinking tags not hidden after tool calls (#7509).
  • Fix GPT-OSS channel tokens leaking in UI after tool calls.

Dependency updates

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

Note

NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

GPU/Platform llama.cpp ik_llama.cpp
NVIDIA (CUDA 12.4) Download (766 MB) Download (1.1 GB)
NVIDIA (CUDA 13.1) Download (686 MB) Download (1.19 GB)
AMD/Intel (Vulkan) Download (196 MB)
AMD (ROCm 7.2) Download (499 MB)
CPU only Download (178 MB) Download (194 MB)

Linux

GPU/Platform llama.cpp ik_llama.cpp
NVIDIA (CUDA 12.4) Download (747 MB) Download (1.09 GB)
NVIDIA (CUDA 13.1) Download (696 MB) Download (1.21 GB)
AMD/Intel (Vulkan) Download (208 MB)
AMD (ROCm 7.2) Download (307 MB)
CPU only Download (190 MB) Download (217 MB)

macOS

Architecture llama.cpp
Apple Silicon (arm64) Download (156 MB)
Intel (x86_64) Download (162 MB)

Updating a portable install:

  1. Download and extract the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

text-generation-webui-4.0/
text-generation-webui-4.1/
user_data/                    <-- shared by both installs

Don't miss a new textgen release

NewReleases is sending notifications on new releases.