Changes
- log error when llama-server request exceeds context size (#7263). Thanks, @mamei16.
- Make --trust-remote-code immutable from the UI/API for better security.
Bug fixes
- Fix metadata leaking into branched chats.
- Fix "continue" missing an initial space in chat-instruct/chat modes.
- Fix resuming incomplete downloads after HF moved to Xet.
- Revert exllamav3_hf changes in v3.14 that made it output gibberish.
Backend updates
- Update llama.cpp to https://github.com/ggml-org/llama.cpp/tree/f9fb33f2630b4b4ba9081ce9c0c921f8cd8ba4eb.
- Update exllamav3 0.0.10.
Portable builds
Below you can find self-contained packages that work with GGUF models (llama.cpp) and require no installation! Just download the right version for your system, unzip, and run.
Which version to download:
-
Windows/Linux:
- NVIDIA GPU: Use
cuda12.4
for newer GPUs orcuda11.7
for older GPUs and systems with older drivers. - AMD/Intel GPU: Use
vulkan
builds. - CPU only: Use
cpu
builds.
- NVIDIA GPU: Use
-
Mac:
- Apple Silicon: Use
macos-arm64
. - Intel CPU: Use
macos-x86_64
.
- Apple Silicon: Use
Updating a portable install:
- Download and unzip the latest version.
- Replace the
user_data
folder with the one in your existing install. All your settings and models will be moved.