koboldcpp-1.88

NEW: Added Image Inpainting support to StableUI, and merged inpainting support from stable-diffusion.cpp (by @stduhpf)
- You can use the built-in StableUI to mask out areas to inpaint when editing with Img2Img (Similar to A1111). API docs for this are updated.
- Added slider for setting clip-skip in StableUI.
- Other improvements from stable-diffusion.cpp are also merged.
Added Zenity and YAD support for displaying file picker dialogs on linux (by @henk717), if they are installed on your system they will be used. To continue using the previous TKinter file picker, you can select "Use Classic FilePicker" in the extras tab.
Added a new API endpoint /api/extra/json_to_grammar which can be used to convert a JSON schema into GBNF grammar (check API docs for an example).
Added --maxrequestsize flag, you can configure the server max payload size before a HTTP request is dropped (default 32mb).
Can now perform GPU memory estimation using vulkaninfo too (if nvidia-smi is not available).
Merged Llama 4 support from upstream llama.cpp. Qwen3 is technically included too, but until it releases officially we won't know if it actually works.
Fixed not autosetting backend and layers when swapping to new model in admin mode using a template.
Added additional warnings in GUI and terminal when you try to use FlashAttention on Vulkan backend - generally this is discouraged due to performance issues.
Fixed system prompt on gemma3 template
Updated Kobold Lite, multiple fixes and improvements
- Added Llama4 prompt format
- Consolidated vision dropdown when selecting a vision provider
- Fixed think tokens formatting issue with markdown
Merged fixes and improvements from upstream

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

LostRuins/koboldcpp v1.88 koboldcpp-1.88 on GitHub

koboldcpp-1.88

LostRuins/koboldcpp v1.88
koboldcpp-1.88

on GitHub