github oobabooga/text-generation-webui v4.4
v4.4 - MCP server support!

5 hours ago

Changes

  • MCP server support: Use remote MCP servers from the UI. Just add one server URL per line in the new "MCP servers" field in the Chat tab and send a message. Tools will be discovered automatically and used alongside local tools. [Tutorial]
  • Several UI improvements, further modernizing the theme:
    • Improve hover menu appearance in the Chat tab.
    • Improve scrollbar styling (thinner, more rounded).
    • Improve message text contrast and heading colors.
    • Improve message action icon visibility in light mode.
    • Make blockquote, table, and hr borders more subtle and consistent.
    • Improve accordion outline styling.
    • Reduce empty space between chat input and message contents.
    • Hide spin buttons on all sliders (these looked ugly on Windows).
    • Show filename tooltip on file attachments in the chat input.
  • Add Windows + ROCm portable builds.
  • Image generation: Embed metadata in API responses. PNG images returned by the API now include generation settings (model, seed, dimensions, steps, CFG scale, sampler) in the file metadata.
  • API: Add instruction_template and instruction_template_str parameters in the model load endpoint.
  • API: Remove the deprecated settings parameter from the model load endpoint.
  • Move the cpu-moe checkbox to extra flags (no longer needed now that --fit exists).

Bug fixes

  • Fix inline LaTeX rendering: $...$ expressions are now protected from being parsed as markdown (#7423).
  • Fix crash when truncating prompts with tool call messages.
  • Fix "address already in use" on server restart (Linux/macOS).
  • Fix GPT-OSS reasoning tags briefly leaking into streamed output between thinking and tool calls.
  • Fix tool call check sometimes truncating visible text at end of generation.
  • Fix image generation failing with Flash Attention 2 errors by defaulting attention to SDPA.
  • Fix loader args leaking between sequential API model loads.
  • Fix IPv6 address formatting in the API.

Dependency updates


Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

Note

NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Windows

GPU/Platform llama.cpp ik_llama.cpp
NVIDIA (CUDA 12.4) Download (777 MB) Download (1.09 GB)
NVIDIA (CUDA 13.1) Download (698 MB) Download (1.19 GB)
AMD/Intel (Vulkan) Download (207 MB)
AMD (ROCm 7.2) Download (516 MB)
CPU only Download (191 MB) Download (192 MB)

Linux

GPU/Platform llama.cpp ik_llama.cpp
NVIDIA (CUDA 12.4) Download (761 MB) Download (1.09 GB)
NVIDIA (CUDA 13.1) Download (712 MB) Download (1.21 GB)
AMD/Intel (Vulkan) Download (223 MB)
AMD (ROCm 7.2) Download (329 MB)
CPU only Download (207 MB) Download (217 MB)

macOS

Architecture llama.cpp
Apple Silicon (arm64) Download (181 MB)
Intel (x86_64) Download (187 MB)

Updating a portable install:

  1. Download and extract the latest version.
  2. Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

text-generation-webui-4.0/
text-generation-webui-4.1/
user_data/                    <-- shared by both installs

Don't miss a new text-generation-webui release

NewReleases is sending notifications on new releases.