LostRuins/koboldcpp v1.3 on GitHub

koboldcpp-1.3

-Bug fixes for various issues (missing endpoints, malformed url)
-Merged upstream file loading enhancements. mmap is now disabled by default, enable with --usemmap
-Now can automatically distinguish between older and newer GPTJ and GPT2 quantized files.
-Version numbers are now displayed at start

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Alternative Options:
If your CPU is very old and doesn't support AVX2 instructions, you can try running the noavx2 version. It will be slower.
If you prefer, you can download the zip file, extract and run the python script e.g. koboldcpp.py [ggml_model.bin] manually
To quantize an fp16 model, you can use the quantize.exe in the tools.zip

LostRuins/koboldcpp v1.3 koboldcpp-1.3 on GitHub

LostRuins/koboldcpp v1.3
koboldcpp-1.3

on GitHub