LostRuins/koboldcpp v1.0.3 on GitHub

llamacpp-for-kobold-1.0.3

Applied the massive refactor from the parent repo. It was a huge pain but I managed to keep the old tokenizer untouched and retained full support for the original model formats.
Reduced default batch sizes greatly, as large batch sizes were causing bad output and high memory usage
Support dynamic context lengths sent from client.
TavernAI is working although I wouldn't recommend it, they spam the server with multiple requests of huge contexts so you're going to have a very painful time getting responses.

Weights not included.
To use, download, extract and run (defaults port is 5001):
llama_for_kobold.py [ggml_quant_model.bin] [port]

and then you can connect like this (or use the full koboldai client):
http://localhost:5001

LostRuins/koboldcpp v1.0.3 llamacpp-for-kobold-1.0.3 on GitHub