Changes
- The project has been renamed to TextGen! The GitHub URL is now github.com/oobabooga/textgen.
- Logits display improvements (#7486). Thanks, @wiger3.
- UI: Add sky-blue color for quoted text in light mode (#7473). Thanks, @Th-Underscore.
- Reduce VRAM peak in prompt logprobs forward pass.
Bug fixes
- Fix Gemma-4 tool calling: handle double quotes and newline chars in arguments (#7477). Thanks, @mamei16.
- Fix chat scroll getting stuck on thinking blocks (#7485).
- Prevent Tool Icon SVG Shrinking When Tool Calls Are Long (#7488). Thanks, @mamei16.
- Fix: wrong chat deleted when selection changes before confirm (#7483). Thanks, @lawrence3699.
- Fix bos/eos tokens not being set for models without a chat template. Defaults are now reset before reading model metadata.
- Fix duplicate BOS token being prepended in ExLlamav3.
- Fix version metadata not syncing on Continue (#7492).
- Fix
row_splitnot working with ik_llama.cpp —--split-mode rowis now converted to--split-mode graph(#7489). - Fix "Start reply with" crash (#7497). 🆕 - v4.5.1.
- Fix tool responses with Gemma 4 template (#7498). 🆕 - v4.5.1.
- UI: Fix consecutive thinking blocks rendering with Gemma 4. 🆕 - v4.5.1.
- Fix bos/eos tokens being overwritten after GGUF metadata sets them (#7496). 🆕 - v4.5.2
Dependency updates
- Update llama.cpp to ggml-org/llama.cpp@5d14e5d
- Update ik_llama.cpp to ikawrakow/ik_llama.cpp@47986f0
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.
Note
NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.
ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.
Windows
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (774 MB) | Download (1.09 GB) |
| NVIDIA (CUDA 13.1) | Download (696 MB) | Download (1.19 GB) |
| AMD/Intel (Vulkan) | Download (209 MB) | — |
| AMD (ROCm 7.2) | Download (517 MB) | — |
| CPU only | Download (191 MB) | Download (192 MB) |
Linux
| GPU/Platform | llama.cpp | ik_llama.cpp |
|---|---|---|
| NVIDIA (CUDA 12.4) | Download (758 MB) | Download (1.09 GB) |
| NVIDIA (CUDA 13.1) | Download (710 MB) | Download (1.21 GB) |
| AMD/Intel (Vulkan) | Download (225 MB) | — |
| AMD (ROCm 7.2) | Download (330 MB) | — |
| CPU only | Download (207 MB) | Download (218 MB) |
macOS
| Architecture | llama.cpp |
|---|---|
| Apple Silicon (arm64) | Download (182 MB) |
| Intel (x86_64) | Download (188 MB) |
Updating a portable install:
- Download and extract the latest version.
- Replace the
user_datafolder with the one in your existing install. All your settings and models will be moved.
Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:
text-generation-webui-4.0/
text-generation-webui-4.1/
user_data/ <-- shared by both installs