koboldcpp-1.15
- Added a brand new "Easy Mode" GUI which triggers if no command line arguments are set. This is aimed to be a noob-friendly way to get into KoboldCpp, but for full functionality you are still advised to run it from the command line with customized arguments. You can skip it with any command line argument, or using the flag
--skiplauncher
which does nothing else. - Pulled the new quantization format support for q5_0 and q5_1 for llama.cpp from upstream. Also pulled the q5 changes for GPT-2, GPT-J and GPT-NeoX formats. Note that these will not work in CLBlast yet - but OpenBLAS should work fine.
- Added a new flag
--debugmode
which shows the Tokenized prompt being sent to the backend within the terminal window. - Setting
--stream
flag now automatically redirects the URL in the embedded Kobold Lite UI, no need to type?streaming=1
anymore. - Updated Kobold Lite, now supports multiple custom stopping sequences which you can specify, separating in the UI with the
||$||
delimiter. Lite also now saves your custom stopping sequences into your save files and autosaves. - Merged upstream fixes and improvements.
- Minor console fixes for Linux, and OSX compatibility.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program with the --help
flag.
Disclaimer: This version has Cloudflare Insights in the Kobold Lite UI, which was subsequently removed in v1.17