koboldcpp-1.101.1

very spooky edition

Support for Qwen3-VL is merged - For a quick test, get the Qwen3-VL-2B-Instruct model here and the mmproj here. Larger versions exist, but this will work well enough for simple tasks.
Added Qwen Image and Qwen Image Edit - Support is now officially available for Qwen Image generation models. These have much better prompt adherence than SDXL or even Flux. Here's how to set up qwen image edit:
- Get the Qwen Image Edit 2509 model here and load it as the image gen model
- Get the Qwen Image VAE and load it as VAE
- Get Qwen2.5-VL-7B-Instruct and load it as Clip-1
- Get Qwen2.5-VL-7B mmproj and load it as Clip-2
- That's basically it! You can now generate images normally in Kobold or any connected frontend.
- You can do image editing using the SDUI (http://localhost:5001/sdui) by uploading a source Reference Image and asking the AI to make changes. Alternatively, providing no reference image allows normal txt2img generation.
- To use the non-edit version of Qwen Image, you can use these models instead
- For a quick setup, you can use this .kcppt launcher template by @henk717
Added aliases for the OpenAI compatible endpoints without /v1/ prefix.
Supports using multiple --overridekv, split by commas.
Renamed --blasbatchsize to just --batchsize (old name will still work)
Made preview in GUI GPU layer count more accurate, no more +2 extra layers.
Added experimental support for fractional scaling in the GUI launcher for Wayland on GNOME. You're still recommended to use KDE or disable fractional scaling for better results.
Image generation precision fixes and fallbacks. SDUI also now supports copy with right click on the image preview.
Added selection for image generation scheduler
Added support for logprobs streaming in openai chat completions API (sent at end)
Added VITS api server compatibility endpoint
PyInstaller upgraded from 5.11 to 5.12 to fix a crashing bug
Added Horde worker Job stats by @xzuyn
Updated Kobold Lite, multiple fixes and improvements
- New: Added branching support! You can now create ST style "branches" in the same story, allowing you to explore multiple alternate possibilities without requiring multiple save files. You can create and delete branches at any point in your story and swap between them at will.
- Better inline markdown and code rendering
- Better turn history segmenting after leaving edit mode, also improved AutoRole turn packing
- Improve trim sentences behavior, improve autoscroll behavior, improve mobile detection
- Added ccv3 tavern card support
- Aborted gens will now request for logprobs if enabled
Merged new model support, fixes and improvements from upstream, including some Vulkan speedups from occam
NOTE: Qwen3Next support is NOT merged yet. It is still undergoing development upstream, follow it here: ggml-org#16095

Hotfix 1.101.1 - Fixed a regression with rowsplit, fixed issue loading very old mmproj files, fixed a crash with qwen image edit.

Starting at 1.101.1 we have upgraded the bundled ROCm library of our ROCm Linux Binary to 7.1, this will have an impact on which GPU's are supported. You should now be able to use KoboldCpp on your 9000 GPU on Linux without having to compile from source. If your system's driver was capable of running the last ROCm release updating drivers is not required, it will automatically use ROCm 7.1 even if you have an older ROCm installed.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

LostRuins/koboldcpp v1.101.1 koboldcpp-1.101.1 on GitHub

koboldcpp-1.101.1

LostRuins/koboldcpp v1.101.1
koboldcpp-1.101.1

on GitHub