koboldcpp-1.0.9beta
- Integrated support for GPT2! This also should theoretically work with Cerebras models, but I have not tried those yet. This is a great way to get started as now you can try models so tiny even a potato CPU can run them. Here's a good one to start with: https://huggingface.co/ggerganov/ggml/resolve/main/ggml-model-gpt-2-117M.bin with which I can generate 100 tokens in a second.
- Upgraded embedded Kobold Lite to support a Stanford Alpaca compatible Instruct Mode, which can be enabled in settings.
- Removed all
-march=native
and-mtune=native
flags when building the binary. Compatibility should be more consistent with different devices now. - Fixed an incorrect flag name used to trigger the ACCELERATE library for mac OSX. This should give you greatly increased performance of OSX users for GPT-J and GPT2 models, assuming you have ACCELERATE support.
- Added Rep Pen for GPT-J and GPT-2 models, and by extension pyg.cpp, this means that repetition penalty now works similar to the way it does in llama.cpp.
To use, download and run the koboldcpp.exe
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001