LostRuins/koboldcpp v1.28 on GitHub

koboldcpp-1.28

KoboldCpp Changes:

NEW: Added support for MPT models! Note that to use larger context lengths, remember to set it with --contextsize. Values up to around 5000 context tokens have been tested successfully.
The KoboldCpp Easy Launcher GUI has been enhanced! You can now set the number of CLBlast GPU layers in the GUI, as well as the number of threads to use. Additional toggles have also been added.
Added a more efficient memory allocation to CLBlast! You should be able to offload more layers than before.
The flag --renamemodel has been renamed (lol) to --hordeconfig and now accepts 2 parameters, the horde name to display, and the advertised max generation length on horde.
Fixed memory issues with Starcoder models. They still don't work very well with BLAS especially for lower RAM devices, so you might want to use a smaller --blasbatchsize with them, 64 or 128.
Added the option to use --blasbatchsize -1 which disables BLAS but still allows you to use GPU Layer offloading in Clblast. This means if you don't use BLAS, you can offload EVEN MORE LAYERS and generate even faster (at the expense of slow prompt processing).
Minor tweaks and adjustments to defaults settings.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.
This release also includes a zip file containing the libraries and the koboldcpp.py script, for those who prefer not use to the one-file pyinstaller.

LostRuins/koboldcpp v1.28 koboldcpp-1.28 on GitHub

koboldcpp-1.28

LostRuins/koboldcpp v1.28
koboldcpp-1.28

on GitHub