LostRuins/koboldcpp v1.27 on GitHub

koboldcpp-1.27

KoboldCpp Changes:

Integrated the Clblast GPU offloading improvements from @0cc4m which allows you to have a layer fully stored in VRAM instead of keeping a duplicate copy in RAM. As a result, offloading GPU layers will reduce overall RAM used.
Pulled upstream support for OpenLlama 3B models.
Added support for the new version of RWKV.cpp models (v101) from @saharNooby that uses the updated GGML library, and is smaller and faster. Both the older and newer quantization formats will still be supported automatically, backwards compatible.
Added support for EOS tokens in RWKV
Updated Kobold Lite. One new and exciting feature is AutoGenerated Memory, which performs a text summary on your story to generate a short memory with a single click. Works best on instruct models.
Allowed users to rename their displayed model name now, intended for use in horde. Using--renamemodel lets you change the default name to any string, with an added koboldcpp/ prefix as suggested by Henky.
Fixed some build errors on some versions of OSX and Linux

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.
This release also includes a zip file containing the libraries and the koboldcpp.py script, for those who prefer not use to the one-file pyinstaller.

LostRuins/koboldcpp v1.27 koboldcpp-1.27 on GitHub

koboldcpp-1.27

LostRuins/koboldcpp v1.27
koboldcpp-1.27

on GitHub