github oobabooga/text-generation-webui v2.8

latest releases: v3.2, v3.1, v3.0...
14 days ago

✨ Changes

  • New llama.cpp loader (#6846). A brand new, lightweight llama.cpp loader based on llama-server has been added, replacing llama-cpp-python. With that:
    • New sampling parameters are now available in the llama.cpp loader, including xtc, dry, and dynatemp.
    • llama.cpp has been updated to the latest version, adding support for the new Llama-4-Scout-17B-16E-Instruct model.
    • The installation size for the project has been reduced.
    • llama.cpp performance should be slightly faster.
    • llamacpp_HF had to be removed :( There is just 1 llama.cpp loader from now on.
    • llama.cpp updates will be much more frequent from now on.
  • Smoother chat streaming in the UI. Words now appear one at a time in the Chat tab instead of in chunks, which makes streaming feel nicer.
  • Allow for model subfolder organization for GGUF files (#6686). Thanks, @Googolplexed0.
    • With that, llama.cpp models can be placed in subfolders inside text-generation-webui/models for better organization (or for importing files from LM Studio).
  • Remove some obsolete command-line flags to clean-up the repository.

🔧 Bug fixes

  • Fix an overflow bug in ExLlamaV2_HF introduced after recent updates.
  • Fix GPTQ models being loaded through Transformers instead of ExLlamaV2_HF.

🔄 Backend updates

  • llama.cpp: Bump to commit b9154ecff93ff54dc554411eb844a2a654be49f2 from April 18th, 2025.
  • ExLlamaV3: Bump to commit c44e56c73b2c67eee087c7195c9093520494d3bf from April 18th, 2025.

Don't miss a new text-generation-webui release

NewReleases is sending notifications on new releases.