LostRuins/koboldcpp v1.16 on GitHub

koboldcpp-1.16

Integrated the overhauled Token Samplers. The whole sampling system has been reworked for Top-P, Top-K and Rep Pen, all model architectures and types now use the same sampling functions. Also added 2 new samplers - Tail Free Sampling (TFS) and Typical Sampling. As I did not test the new implementations for correctness, please let me know if you are experiencing weird results (or degradations for previous samplers).
Integrated CLBlast support for the q5_0 and q5_1 formats. Note: Upstream llama.cpp repo has completely removed support for the q4_3 format. For now I still plan to keep support for q4_3 available within KoboldCpp but you are strongly advised not to use q4_3 anymore. Please switch or reconvert any q4_3 models if you can.
Fixed a few edge cases with GPT2 models going OOM with small batch sizes.
Fixed a regression where older GPT-J models (e.g. the original model from Alpin's Pyg.cpp fork) failed to load due to some upstream changes in the GGML library. You are strongly advised to not use outdated formats - reconvert if you can, it will be faster.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Disclaimer: This version has Cloudflare Insights in the Kobold Lite UI, which was subsequently removed in v1.17

LostRuins/koboldcpp v1.16 koboldcpp-1.16 on GitHub

LostRuins/koboldcpp v1.16
koboldcpp-1.16

on GitHub