Changes
- MCP server support: Use remote MCP servers from the UI. Just add one server URL per line in the new "MCP servers" field in the Chat tab and send a message. Tools will be discovered automatically and used alongside local tools. [Tutorial]
- Several UI improvements, further modernizing the theme:
- Improve hover menu appearance in the Chat tab.
- Improve scrollbar styling (thinner, more rounded).
- Improve message text contrast and heading colors.
- Improve message action icon visibility in light mode.
- Make blockquote, table, and hr borders more subtle and consistent.
- Improve accordion outline styling.
- Reduce empty space between chat input and message contents.
- Hide spin buttons on all sliders (these looked ugly on Windows).
- Show filename tooltip on file attachments in the chat input.
- Add Windows + ROCm portable builds.
- Image generation: Embed metadata in API responses. PNG images returned by the API now include generation settings (model, seed, dimensions, steps, CFG scale, sampler) in the file metadata.
- API: Add
instruction_templateandinstruction_template_strparameters in the model load endpoint. - API: Remove the deprecated
settingsparameter from the model load endpoint. - Move the
cpu-moecheckbox to extra flags (no longer needed now that--fitexists).
Bug fixes
- Fix inline LaTeX rendering:
$...$expressions are now protected from being parsed as markdown (#7423). - Fix crash when truncating prompts with tool call messages.
- Fix "address already in use" on server restart (Linux/macOS).
- Fix GPT-OSS reasoning tags briefly leaking into streamed output between thinking and tool calls.
- Fix tool call check sometimes truncating visible text at end of generation.
- Fix image generation failing with Flash Attention 2 errors by defaulting attention to SDPA.
- Fix loader args leaking between sequential API model loads.
- Fix IPv6 address formatting in the API.
Dependency updates
- Update llama.cpp to ggml-org/llama.cpp@d0a6dfe
- Update ik_llama.cpp to ikawrakow/ik_llama.cpp@67fc9c5 (adds Gemma 4 support)
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.
Note
NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.
ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.
Windows
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (777 MB) | Download (1.09 GB) |
| NVIDIA (CUDA 13.1) | Download (698 MB) | Download (1.19 GB) |
| AMD/Intel (Vulkan) | Download (207 MB) | — |
| AMD (ROCm 7.2) | Download (516 MB) | — |
| CPU only | Download (191 MB) | Download (192 MB) |
Linux
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (761 MB) | Download (1.09 GB) |
| NVIDIA (CUDA 13.1) | Download (712 MB) | Download (1.21 GB) |
| AMD/Intel (Vulkan) | Download (223 MB) | — |
| AMD (ROCm 7.2) | Download (329 MB) | — |
| CPU only | Download (207 MB) | Download (217 MB) |
macOS
| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (181 MB) |
| Intel (x86_64) | Download (187 MB) |
Updating a portable install:
- Download and extract the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:
text-generation-webui-4.0/
text-generation-webui-4.1/
user_data/ <-- shared by both installs