oobabooga/textgen v3.8
on GitHub

latest releases: v4.4, v4.3.3, v4.3.2...

8 months ago

Changes

Replace use_flash_attention_2/use_eager_attention with a unified attn_implementation in the Transformers loader
Ignore add_bos_token in instruct prompts, let the jinja2 template decide
Add a "None" option for the speculative decoding model

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/90083283ec254fa8d33897746dea229aee401b37
Update Transformers to 4.53
- Also update bitsandbytes/Accelerate/PEFT to the latest versions
Update ExLlamaV3 to 0.0.5
Update ExLlamaV2 to 0.3.2

Portable builds

Below you can find portable builds: self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Check out latest releases or
releases around oobabooga/textgen v3.8

Don't miss a new textgen release

NewReleases is sending notifications on new releases.

Get notifications