github LostRuins/koboldcpp v1.86
koboldcpp-1.86

22 hours ago

koboldcpp-1.86

splash

  • Integrated Gemma3 support, to use it you can grab the gguf model and vision mmproj such as this one and load both of them in KoboldCpp, similar to earlier vision models. Everything else should work out of the box in Lite (click Add Img to paste or upload an image). Vision will also work in SillyTavern in custom Chat Completions API (enabling inline images)
  • Fixed OpenAI API finish_reason value and tool calling behaviors.
  • Reenable support for cuda compute capability 3.7 (K80)
  • Allow option to save stories to google drive when used in Colab
  • Added speculative success rate information in /api/extra/perf/
  • Allow downloading Image Generation LoRAs from URL launch arguments
  • Added image Generation param metadata to generated image (thanks @wbruna)
  • CI builds now also rebuild Vulkan shaders.
  • Replaced winclinfo.exe with a simpler version (see simpleclinfo.cpp) that only fetches GPU names.
  • Allow admin mode to runtime swap between gguf model files as well, in addition to swapping between kcpps configs. When swapping models in this way, default GPU layers and selections will be picked.
  • Updated Kobold Lite, multiple fixes and improvements
    • Added a new instruct preset "KoboldCppAutomatic" which automatically obtains the instruct template from KoboldCpp.
    • Improvements and fixes for side panel mode
  • Merged fixes and improvements from upstream

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Don't miss a new koboldcpp release

NewReleases is sending notifications on new releases.