koboldcpp-1.41 (beta)

It's been a while since the last release and quite a lot upstream has changed under the hood, so consider this release a beta.

Added support for LLAMA GGUF models, handled automatically. All older models will still continue to work normally. Note that GGUF format support for other non-llama architectures has not been added yet.
Added --config flag to load a .kcpps settings file when launching from command line (Credits: @poppeman), these files can also be imported/exported from the GUI.
Added a new endpoint /api/extra/tokencount which can be used to tokenize and accurately measure how many tokens any string has.
Fix for bell characters occasionally causing the terminal to beep in debug mode.
Fix for incorrect list of backends & missing backends displayed in the GUI.
Set MMQ to be the default for CUDA when running from GUI.
Updated Lite, and merged all the improvements and fixes from upstream.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.41 koboldcpp-1.41 (beta) on GitHub

koboldcpp-1.41 (beta)

LostRuins/koboldcpp v1.41
koboldcpp-1.41 (beta)

on GitHub