koboldcpp-1.3
-Bug fixes for various issues (missing endpoints, malformed url)
-Merged upstream file loading enhancements. mmap
is now disabled by default, enable with --usemmap
-Now can automatically distinguish between older and newer GPTJ and GPT2 quantized files.
-Version numbers are now displayed at start
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
Alternative Options:
If your CPU is very old and doesn't support AVX2 instructions, you can try running the noavx2 version. It will be slower.
If you prefer, you can download the zip file, extract and run the python script e.g. koboldcpp.py [ggml_model.bin]
manually
To quantize an fp16 model, you can use the quantize.exe in the tools.zip