koboldcpp-1.102
cold november rain edition
- New: Now bundles the llama.cpp UI into KoboldCpp, as an extra option for those who prefer it. Access it at http://localhost:5001/lcpp
- The llama.cpp UI is designed strongly for assistant use-cases and provides a ChatGPT like interface, with support for importing documents like .pdf files. It can be accessed in parallel to the usual KoboldAI Lite UI (which is recommended for roleplay/story writing) and does not take up any additional resources while not in use.
- New: Massive universal tool calling improvement from @Rose22, with the new format KoboldCpp is now even better at calling tools and using multiple tools in sequence correctly. Works automatically with all tool calling capable frontends (OpenWebUI / SillyTavern etc) in chat completions mode and may work on models that normally do not support tool calling (in the correct format).
- New: Added support for jinja2 templates via
/v1/chat/completions, for those that have been asking for it. There are 3 modes:- Current Default: Uses KoboldCpp ChatAdapter templates, KoboldCpp universal toolcalling module (current behavior, most recommended).
- Using
--jinja: Uses jinja2 template from GGUF in chat completions mode for normal messages, uses KoboldCpp universal toolcalling module. Use this only if you love jinja. There are GGUF models on Huggingface which will explicitly mention --jinja must be used to get normal results, this does not apply to KoboldCpp as our regular modes cover these cases. - Using
--jinja_tools: Uses jinaja2 template from GGUF in chat completions mode for all messages and tools. Not recommended in general. In this mode the model and frontend are responsible for the compatibility.
- Synced and updated Image Generation to latest stable-diffusion.cpp, big thanks to @wbruna. Please report any issues you encounter.
- Updated google Colab notebook with easier default selectable presets, thanks @henk717
- Allow GUI launcher window to be resized slightly larger horizontally, in case some text gets cut off.
- Fixed a divide by zero error with audio projectors
- Added Vulkan support for whisper.
- Filename insensitive search when selecting chat completion adapters
- Fixed an old bug that caused mirostat to swap parameters. To get the same result as before, swap values for
tauandeta. - Added a debug command
--testmemoryto check what values auto GPU detection retrieves (not needed for most) - Now serves KoboldAI Lite UI gzipped to browsers that can support it, for faster UI loading.
- Added sampler support for smoothing curve
- Updated Kobold Lite, multiple fixes and improvements
- Web Link-sharing now defaults to dpaste.com as dpaste.org is shut down
- Added option to save and load custom scenarios in a Scenario Library (like stories but do not contain most settings)
- Allow single-turn deletion and editing in classic theme instruct mode (click on the icon)
- Better turn chunking and repacking after editing a message
- Merged new model support, fixes and improvements from upstream
- NOTE: Qwen3Next support is NOT merged yet. It is still undergoing development upstream, follow it here: ggml-org#16095
Separately our docker image has been updated to a newer faster Vulkan driver for some AMD GPU's, if you use our docker image a manual docker pull is recommended as these drivers are not always covered by the automatic updates.
Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.