oobabooga/text-generation-webui v4.4 on GitHub

Changes

MCP server support: Use remote MCP servers from the UI. Just add one server URL per line in the new "MCP servers" field in the Chat tab and send a message. Tools will be discovered automatically and used alongside local tools. [Tutorial]
Several UI improvements, further modernizing the theme:
- Improve hover menu appearance in the Chat tab.
- Improve scrollbar styling (thinner, more rounded).
- Improve message text contrast and heading colors.
- Improve message action icon visibility in light mode.
- Make blockquote, table, and hr borders more subtle and consistent.
- Improve accordion outline styling.
- Reduce empty space between chat input and message contents.
- Hide spin buttons on all sliders (these looked ugly on Windows).
- Show filename tooltip on file attachments in the chat input.
Add Windows + ROCm portable builds.
Image generation: Embed metadata in API responses. PNG images returned by the API now include generation settings (model, seed, dimensions, steps, CFG scale, sampler) in the file metadata.
API: Add instruction_template and instruction_template_str parameters in the model load endpoint.
API: Remove the deprecated settings parameter from the model load endpoint.
Move the cpu-moe checkbox to extra flags (no longer needed now that --fit exists).

Bug fixes

Fix inline LaTeX rendering: $...$ expressions are now protected from being parsed as markdown (#7423).
Fix crash when truncating prompts with tool call messages.
Fix "address already in use" on server restart (Linux/macOS).
Fix GPT-OSS reasoning tags briefly leaking into streamed output between thinking and tool calls.
Fix tool call check sometimes truncating visible text at end of generation.
Fix image generation failing with Flash Attention 2 errors by defaulting attention to SDPA.
Fix loader args leaking between sequential API model loads.
Fix IPv6 address formatting in the API.

Dependency updates

Update llama.cpp to ggml-org/llama.cpp@d0a6dfe
Update ik_llama.cpp to ikawrakow/ik_llama.cpp@67fc9c5 (adds Gemma 4 support)

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

Note

NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

GPU/Platform	llama.cpp	ik_llama.cpp
NVIDIA (CUDA 12.4)	Download (777 MB)	Download (1.09 GB)
NVIDIA (CUDA 13.1)	Download (698 MB)	Download (1.19 GB)
AMD/Intel (Vulkan)	Download (207 MB)	—
AMD (ROCm 7.2)	Download (516 MB)	—
CPU only	Download (191 MB)	Download (192 MB)

Linux

GPU/Platform	llama.cpp	ik_llama.cpp
NVIDIA (CUDA 12.4)	Download (761 MB)	Download (1.09 GB)
NVIDIA (CUDA 13.1)	Download (712 MB)	Download (1.21 GB)
AMD/Intel (Vulkan)	Download (223 MB)	—
AMD (ROCm 7.2)	Download (329 MB)	—
CPU only	Download (207 MB)	Download (217 MB)

macOS

Architecture	llama.cpp
Apple Silicon (arm64)	Download (181 MB)
Intel (x86_64)	Download (187 MB)

Updating a portable install:

Download and extract the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

text-generation-webui-4.0/
text-generation-webui-4.1/
user_data/                    <-- shared by both installs

oobabooga/text-generation-webui v4.4 v4.4 - MCP server support! on GitHub