koboldcpp-1.20
- Added an option to allocate more RAM for massive context sizes, to allow testing with models with > 2048 context. You can change this with the flag
--contextsize
- Added experimental support for the new RedPajama variant of GPT-NeoX models. As the model formats are nearly identical to Pythia, this was particularly tricky to implement. This uses a very ugly hack to determine whether it's a RedPajama model. If detection fails, you can always force it with the flag
--forceversion
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program with the --help
flag.
This release also includes a zip file containing the libraries and the koboldcpp.py
script, for those who prefer not use to the one-file pyinstaller.