llamacpp-for-kobold-1.0.1
- Bugfixes for OSX, and KV caching allows continuing a previous generation without reprocessing the whole prompt
- Weights not included.
To use, download, extract and run (defaults port is 5001):
llama_for_kobold.py [ggml_quant_model.bin] [port]
and then you can connect like this (or use the full koboldai client):
https://lite.koboldai.net/?local=1&port=5001