oobabooga/text-generation-webui v3.18
on GitHub

9 hours ago

Changes

Add --cpu-moe flag for llama.cpp to move MoE model experts to CPU, reducing VRAM usage.
Add ROCm portable builds for AMD GPUs on Linux. This was made possible by PR oobabooga/llama-cpp-binaries#7 by @ShortTimeNoSee. Thanks, @ShortTimeNoSee.
Remove deprecated macOS 13 wheels (no longer supported by GitHub Actions).

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/10e9780154365b191fb43ca4830659ef12def80f
Update ExLlamaV3 to 0.0.15
Update peft to 0.18.*
Update triton-windows to 3.5.1.post21

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Check out latest releases or
releases around oobabooga/text-generation-webui v3.18

Don't miss a new text-generation-webui release

NewReleases is sending notifications on new releases.

Get notifications