✨ Changes
- Portable zip builds for
text-generation-webui
+llama.cpp
! You can now download a fully self-contained (~700 MB) version of the web UI with built-inllama.cpp
support. No installation required.- Available for Windows, Linux, and macOS with builds for
cuda12.4
,cuda11.7
,cpu
, macOSarm64
and macOSx86_64
. - No Miniconda, no
torch
, no downloads after unzipping. - Comes bundled with a portable Python from
astral-sh/python-build-standalone
. - Web UI opens automatically in the browser; API starts by default on
localhost
without the need to use--api
. - All the compilation workflows are public, open-source, and executed on GitHub.
- Fully private as always — no telemetry, no CDN resources, no remote requests.
- Available for Windows, Linux, and macOS with builds for
- Make llama.cpp the default loader in the project.
- Add support for llama-cpp builds from https://github.com/ggml-org/llama.cpp (#6862). Thanks, @Matthew-Jenkins.
- Add back the
--model-menu
flag. - Remove the
--gpu-memory
flag, and reuse the--gpu-split
EXL2 flag for Transformers.
🔄 Backend updates
- llama.cpp: Bump to commit ggml-org/llama.cpp@2016f07