✨ Changes
- Portable zip builds for
text-generation-webui+llama.cpp! You can now download a fully self-contained (~700 MB) version of the web UI with built-inllama.cppsupport. No installation required.- Available for Windows, Linux, and macOS with builds for
cuda12.4,cuda11.7,cpu, macOSarm64and macOSx86_64. - No Miniconda, no
torch, no downloads after unzipping. - Comes bundled with a portable Python from
astral-sh/python-build-standalone. - Web UI opens automatically in the browser; API starts by default on
localhostwithout the need to use--api. - All the compilation workflows are public, open-source, and executed on GitHub.
- Fully private as always — no telemetry, no CDN resources, no remote requests.
- Available for Windows, Linux, and macOS with builds for
- Make llama.cpp the default loader in the project.
- Add support for llama-cpp builds from https://github.com/ggml-org/llama.cpp (#6862). Thanks, @Matthew-Jenkins.
- Add back the
--model-menuflag. - Remove the
--gpu-memoryflag, and reuse the--gpu-splitEXL2 flag for Transformers.
🔄 Backend updates
- llama.cpp: Bump to commit ggml-org/llama.cpp@2016f07