LostRuins/koboldcpp v1.5 on GitHub

koboldcpp-1.5

This release consolidates a lot of upstream bug fixes and improvements, if you had issues with earlier versions please try this one. The upstreamed GPTJ changes should also make GPT-J-6B inference even faster by another 20% or so.
Integrated AVX2 and Non-AVX2 support into the same binary for windows. If your CPU is very old and doesn't support AVX2 instructions, you can switch to compatibility mode with --noavx2, but it will be slower.
Now has integrated experimental CLBlast support thanks to @0cc4m, which uses your GPU to speed up prompt processing. Enable it with --useclblast [platform_id] [device_id]
To quantize various fp16 model, you can use the quantizers in the tools.zip. Remember to convert them from Pytorch/Huggingface format first with the relevant Python conversion scripts.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Alternative Options:
Non-AVX2 version now included in the same .exe file, enable with --noavx2 flags
If you prefer, you can download the zip file, extract and run the python script e.g. koboldcpp.py [ggml_model.bin] manually

LostRuins/koboldcpp v1.5 koboldcpp-1.5 on GitHub

LostRuins/koboldcpp v1.5
koboldcpp-1.5

on GitHub