koboldcpp-1.21.3
KNOWN ISSUES: PLEASE READ
- If you are using v1.21.1 and v1.21.0, there's a misalignment with one of the structs which can cause some models to output nonsense randomly. Please update to v1.21.2
- CLBlast seems to be broken on q8_0 formats in v1.21.0 to v1.21.2. Please update to 1.21.3
Changes:
- Integrated the new quantization formats while maintaining backward compatibility for all older ggml model formats. This was a massive undertaking and it's possible there may be bugs, so please do let me know if anything is broken!
- Fixed some rare out of memory errors that occurred when using GPT2 models with BLAS.
- Updated Kobold Lite: New features include multicolor names, idle chat responses, toggle for the instruct prompts, and various minor fixes.
1.21.1 edit:
- Cleaned up some unnecessary prints regarding BOS first token. Added an info message encouraging OSX users to use Accelerate instead of OpenBLAS since it's usually faster (--noblas)
1.21.2 edit:
- Fixed a error with the OpenCL kernel failing to compile on certain platforms. Please help check.
- Fixed a problem when logits would sometimes be NaN due to an unhandled change in size of the Q8_1 struct compared to previously. This also affected other formats such as NeoX, RedPajama and GPT2 so you are recommended to upgrade to 1.21.2
1.21.3 edit
- Recognize q8_0 as an older format as the new clblast kernel doesnt work correctly with it.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program with the --help
flag.
This release also includes a zip file containing the libraries and the koboldcpp.py
script, for those who prefer not use to the one-file pyinstaller.