oobabooga/text-generation-webui v4.2 on GitHub

Before	After

Changes

Anthropic-compatible API: A new /v1/messages endpoint lets you connect Claude Code, Cursor, and other Anthropic API clients. Supports system messages, content blocks, tool use, tool results, image inputs, and thinking blocks. To use with Claude Code: ANTHROPIC_BASE_URL=http://127.0.0.1:5000 claude.
Updated UI theme: New colors, borders, and button styles across light and dark modes.
--extra-flags now supports literal flags: You can now pass flags directly, e.g. --extra-flags "--rpc 192.168.1.100:50052 --jinja". The old key=value format is still accepted for backwards compatibility.
Training
- Enable gradient_checkpointing by default for lower VRAM usage during training.
- Remove the arbitrary higher_rank_limit parameter.
- Reorganize the training UI.
Strip thinking blocks before tool-call parsing to prevent false-positive tool call detection from <think> content.
Move the OpenAI-compatible API from extensions/openai to modules/api. The old --extensions openai flag is still accepted as an alias for --api.
Set top_p=0.95 as the default sampling parameter for API requests.
Remove 52 obsolete instruction templates from 2023 (Airoboros, Baichuan, Guanaco, Koala, Vicuna v0, MOSS, etc.).
Reduce portable build sizes by using a stripped Python distribution.

Bug fixes

Fix prompt corruption when continuing a chat with context truncation (#7439). Thanks, @Phrosty1.
Fix multi-turn thinking block corruption for Kimi models.
Fix AMD installer failing to resolve ROCm triton dependency.
Fix the --share feature in the Gradio fork.
Fix --extra-flags breaking short long-form-only flags like --rpc.
Fix the instruction template delete dialog not appearing.
Fix file handle leaks and redundant re-reads in model metadata loading (#7422). Thanks, @alvinttang.
Fix superboogav2 broken delete endpoint (#6010). Thanks, @Raunak-Kumar7.
Fix leading spaces in post-reasoning content in API responses.
Fix Cloudflare tunnel retry logic raising after the first failed attempt instead of exhausting retries.
Fix OPENEDAI_DEBUG=0 being treated as truthy.
Fix mutable default argument in LogitsBiasProcessor (#7426). Thanks, @Jah-yee.

Dependency updates

Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/3fc6f1aed172602790e9088b57786109438c2466
Update ExLlamaV3 to 0.0.26

Portable builds

Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip/extract, and run.

Which version to download:

Windows/Linux:
- NVIDIA GPU: Use cuda13.1, or cuda12.4 if you have older drivers.
- AMD/Intel GPU: Use vulkan builds.
- AMD GPU (ROCm): Use rocm builds.
- CPU only: Use cpu builds.
Mac:
- Apple Silicon: Use macos-arm64.
- Intel: Use macos-x86_64.

Updating a portable install:

Download and extract the latest version.
Replace the user_data folder with the one in your existing install. All your settings and models will be moved.

Starting with 4.0, you can also move user_data one folder up, next to the install folder. It will be detected automatically, making updates easier:

text-generation-webui-4.0/
text-generation-webui-4.1/
user_data/                    <-- shared by both installs