koboldcpp-1.26
KoboldCpp Changes:
- NEW! Now, you can view Token Probabilities when using
--debugmode
. When enabled, for every generated token, the console will display the probabilities of up to 4 alternative possible tokens. Good way to know how biased/confident/overtrained a model is. The probability percentage values shown are after all the samplers have been applied, so it's also a great way to test your sampler configurations to see how good they are.--debugmode
also displays the contents of your input and context, as well as their token IDs. Note that using--debugmode
has a slight performance hit, so it is off by default. - NEW! The Top-A sampler has been added! This is my own implementation of a special Kobold-exclusive sampler that does not exist in the upstream llama.cpp repo. This sampler reduces the randomness of the AI whenever the probability of one token is much higher than all the others, proportional to the squared softmax probability of the most probable token. Higher values have a stronger effect. (Put this value on 0 to disable its effect).
- Added support for the Starcoder and Starcoder Chat models.
- Cleaned up and slightly refactored the sampler code, EOS stop tokens should now work for all model types, use
--unbantokens
to enable it. Additionally, the left square bracket[
token is no longer banned by default as modern models don't really need it, and the token IDs were inconsistent across architectures.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program with the --help
flag.
This release also includes a zip file containing the libraries and the koboldcpp.py
script, for those who prefer not use to the one-file pyinstaller.