koboldcpp-1.72

NEW: GPU accelerated Stable Diffusion Image Generation is now possible on Vulkan, huge thanks to @0cc4m
Fixed an issue with mismatched CUDA device ID order.
Incomplete SSE response for short sequences fixed (thanks @pi6am)
SSE streaming fix for unicode heavy languages, which should hopefully mitigate characters going missing due to failed decoding.
GPU layers now defaults to -1 when running in GUI mode, instead of overwriting the existing layer count. The predicted layers is now shown as an overlay label text instead, allowing you to see total layers as well as estimation changes when you adjust launcher settings.
Auto GPU Layer estimation takes into account loading image and whisper models.
Updated Kobold Lite: Now supports SSE streaming over OpenAI API as well, should you choose to use a different backend.
Merged fixes and improvements from upstream, including Gemma2 2B support.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.72 koboldcpp-1.72 on GitHub

koboldcpp-1.72

LostRuins/koboldcpp v1.72
koboldcpp-1.72

on GitHub