koboldcpp-1.87
- NEW: Embeddings endpoint added - GGUF embedding models can now be loaded with
--embeddingsmodel
and accessed from/v1/embeddings
or/api/extra/embeddings
, this can be used to encoding text for search or storage within a Vector database. - NEW: Added OuteTTS Voice Cloning Support! - Now you can upload Speaker JSONs over the TTS API which represent a cloned voice when generating TTS. Read more here and try some sample speakers, or make your own.
- NEW: Merged Qwen2.5VL support from @HimariO fork - Also fixed issues with Qwen2VL when multiple images are used.
- NEW: Automatic function (tool) calling - Improved tool calling support, thanks to help from @henk717, KoboldCpp can now work with tool calls from frontends such as OpenWebUI. Additionally, auto mode is also supported, allowing the model to decide for itself whether a function call is needed or not, and which tool to use (though manually selecting the desired tool with
tool_choice
still provides better results). Note that tool calling requires a relatively intelligent modern model to work correctly (recommended model: Gemma3). For more info on function calling, see here. The tool call detection template can be customized by settingcustom_tools_prompt
in the chat completions adapter. - NEW: Added Command Line Chat Mode - KoboldCpp has come full circle! Now you can use it fully without a GUI, just like good old llama.cpp. Simply run it with
--cli
to enter terminal mode, where you can chat interactively using the command line shell! - Improved AMD rocwmma build detection, also changed Vulkan build process (now requires compiling shaders)
- Merged DP4A Vulkan enhancements by @0cc4m for greater performance on legacy quants in AMD and Intel, please report if you encounter any issues.
--quantkv
can now be used without flash attention, when this is done it only applies quantized-K without quantized-V. Not really advised, performance can suffer.- Truncated base64 image printouts in console (they were too long)
- Added a timeout for vulkaninfo in case it hangs.
- Fixed
--savedatafile
with relative paths - Fixed llama3 template AutoGuess detection.
- Added localtunnel as an alternative fallback option in the Colab, in case Cloudflare tunnels happen to be blocked.
- Updated Kobold Lite, multiple fixes and improvements
- NEW: Added World Info Groups - You can now categorize your world info entries into groups (e.g. for a character/location/event) and easily toggle them on and off in a single click.
- You can also toggle each entry on/off individually without removing it from WI.
- You can easily Import and Export each world info group as JSON to use within another story or chat.
- Added a menu to upload a cloned speaker JSON for use in voice cloning. Read the section for OuteTTS voice cloning above.
- Multiplayer mode UI streamlining.
- Add toggle to allow uploading images as a new turn.
- Increased max resolution of uploaded images used with vision models.
- Switching a model in admin mode now auto refreshes Lite when completed
- NEW: Added World Info Groups - You can now categorize your world info entries into groups (e.g. for a character/location/event) and easily toggle them on and off in a single click.
- Merged fixes and improvements from upstream
Note: The embeddings endpoint currently returns JSON in a wrong format. A hotfix is coming soon to resolve this
This month marks KoboldCpp entering into it's third year! Somehow it has survived, against all odds. Thanks to everyone who has provided valuable feedback, contributions and support over the past months. Enjoy!
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag. You can also refer to the readme and the wiki.