oobabooga/textgen v3.0 on GitHub

✨ Changes

Portable zip builds for text-generation-webui + llama.cpp! You can now download a fully self-contained (~700 MB) version of the web UI with built-in llama.cpp support. No installation required.
- Available for Windows, Linux, and macOS with builds for cuda12.4, cuda11.7, cpu, macOS arm64 and macOS x86_64.
- No Miniconda, no torch, no downloads after unzipping.
- Comes bundled with a portable Python from astral-sh/python-build-standalone.
- Web UI opens automatically in the browser; API starts by default on localhost without the need to use --api.
- All the compilation workflows are public, open-source, and executed on GitHub.
- Fully private as always — no telemetry, no CDN resources, no remote requests.
Make llama.cpp the default loader in the project.
Add support for llama-cpp builds from https://github.com/ggml-org/llama.cpp (#6862). Thanks, @Matthew-Jenkins.
Add back the --model-menu flag.
Remove the --gpu-memory flag, and reuse the --gpu-split EXL2 flag for Transformers.