oobabooga/text-generation-webui v2.8 on GitHub

✨ Changes

New llama.cpp loader (#6846). A brand new, lightweight llama.cpp loader based on llama-server has been added, replacing llama-cpp-python. With that:
- New sampling parameters are now available in the llama.cpp loader, including xtc, dry, and dynatemp.
- llama.cpp has been updated to the latest version, adding support for the new Llama-4-Scout-17B-16E-Instruct model.
- The installation size for the project has been reduced.
- llama.cpp performance should be slightly faster.
- llamacpp_HF had to be removed :( There is just 1 llama.cpp loader from now on.
- llama.cpp updates will be much more frequent from now on.
Smoother chat streaming in the UI. Words now appear one at a time in the Chat tab instead of in chunks, which makes streaming feel nicer.
Allow for model subfolder organization for GGUF files (#6686). Thanks, @Googolplexed0.
- With that, llama.cpp models can be placed in subfolders inside text-generation-webui/models for better organization (or for importing files from LM Studio).
Remove some obsolete command-line flags to clean-up the repository.

llama.cpp: Bump to commit b9154ecff93ff54dc554411eb844a2a654be49f2 from April 18th, 2025.
ExLlamaV3: Bump to commit c44e56c73b2c67eee087c7195c9093520494d3bf from April 18th, 2025.