oobabooga/textgen v4.8 on GitHub

Changes

Redesigned chat composer: Taller input area with the paperclip and message-action buttons pinned to the bottom, similar to Gemini and DeepSeek.
Smooth scroll animation when sending a new message: Inspired by Gemini's chat UI.
Electron improvements:
- Persist window bounds and maximize state across launches.
- Add a --no-electron flag to skip the desktop window and use the web UI in the browser instead.
- Disable spellcheck in the chat input.
API: Add support for list-format content in tool and assistant messages.
Add more space below the last chat/chat-instruct message so its action buttons have breathing room.

Electron:
- Fix --listen mode in the launcher.
- Fix missing log colors on Windows.
- Fix big character picture failing to load (#7540).
Fix speculative decoding broken by upstream llama.cpp arg renames (#7541).
Fix truncation length reverting after model load on UI reload (#7540).
Don't clear the chat input when sending a message with no model loaded (#7542).

TextGen is now a desktop app for local LLMs. Download, unzip, double-click.

Note

NVIDIA GPU: If nvidia-smi reports CUDA Version >= 13.1, use the cuda13.1 build. Otherwise, use cuda12.4.

ik_llama.cpp is a llama.cpp fork with new quant types. If unsure, use the llama.cpp column.

Architecture	llama.cpp
Apple Silicon (arm64)	Download (271 MB)
Intel (x86_64)	Download (283 MB)

Download and extract the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

textgen-4.6/
textgen-4.7/
user_data/    <-- shared by both installs