Experimental GPT-OSS support!

I have obtained some success with the GGUF models under

https://huggingface.co/ggml-org/gpt-oss-20b-GGUF/tree/main
https://huggingface.co/ggml-org/gpt-oss-120b-GGUF/tree/main

It may be necessary to re-download those models in the next days if bugs are found, so make sure to recheck those pages.

Changes

Add a new Reasoning effort UI element in the chat tab, with low, medium, and high options for GPT-OSS
Support standalone .jinja chat templates -- makes it possible to load GPT-OSS through Transformers
Make web search functional with thinking models

Bug fixes

Fix an edge case in chat history loading that caused a crash (closes #7155)
Handle both int and str types in grammar char processing (fixes a rare crash when using grammar)

Backend updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/fd1234cb468935ea087d6929b2487926c3afff4b
Update Transformers to 4.55 (adds GPT-OSS support)

Portable builds

Below you can find portable builds: self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda12.4 for newer GPUs or cuda11.7 for older GPUs and systems with older drivers.
- AMD/Intel GPU: Use vulkan builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel CPU: Use macos-x86_64.

Updating a portable install:

Download and unzip the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

oobabooga/text-generation-webui v3.9 on GitHub