oobabooga/text-generation-webui v3.10 on GitHub

See the Multimodal Tutorial

Changes

Add multimodal support to the UI and API
- With the llama.cpp loader (#7027). This was possible thanks to PR ggml-org/llama.cpp#15108 to llama.cpp. Thanks @65a.
- With ExLlamaV3 through a new ExLlamaV3 loader (#7174). Thanks @Katehuuh.
Add speculative decoding to the new ExLlamaV3 loader.
Use ExLlamav3 instead of ExLlamav3_HF by default for EXL3 models, since it supports multimodal and speculative decoding.
Support loading chat templates from chat_template.json files (EXL3/EXL2/Transformers models)
Default max_tokens to 512 in the API instead of 16
Better organize the right sidebar in the UI
llama.cpp: Pass --swa-full to llama-server when streaming-llm is checked to make it work for models with SWA.

Bug fixes

Fix getting the ctx-size for newer EXL3/EXL2/Transformers models
Fix the exllamav2 loader ignoring add_bos_token
Fix the color of italic text in chat messages
Fix edit window and buttons in Messenger theme (#7100). Thanks @mykeehu.

Backend updates

Bump llama.cpp to ggml-org/llama.cpp@f4586ee

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

oobabooga/text-generation-webui v3.10 v3.10 - Multimodal support! on GitHub

Changes

Bug fixes

Backend updates

Portable builds

Which version to download:

Updating a portable install:

oobabooga/text-generation-webui v3.10
v3.10 - Multimodal support!

on GitHub