koboldcpp-1.76

shivers down your spine edition

NEW: Added Anti-Slop Sampling (Phrase Banning) - You can now provide a specified list of words or phrases prevented from being generated, by backtracking and regenerating when they appear. This capability has been merged into the existing token banning feature. It's now also aliased into the banned_strings field.
- Note: When using Anti-Slop phrase banning, streaming outputs are slightly delayed - this is to allow space for the AI to backtrack a response if necessary. This delay is proportional to the length of the longest banned slop phrase.
- Up to 48 phrase banning sequences can be used, they are not case sensitive.
The /api/extra/perf/ endpoint now includes whether the instance was launched in quiet mode (terminal outputs). Note that this is not foolproof - instances can be running modified versions of KoboldCpp.
Added timestamp information when each request starts.
Increased some limits for number of stop sequences, logit biases, and banned phrases.
Fixed a GUI launcher bug when a changed backend dropdown was overridden by a CLI flag.
Updated Kobold Lite, multiple fixes and improvements
- NEW: Added a new scenario - Roleplay Character Creator. This Kobold Lite scenario presents users with an easy-to-use wizard for creating their own roleplay bots with the Aesthetic UI. Simply fill in the requested fields and you're good to go. The character can always be edited subsequently from the 'Context' menu. Alternatively, you can also load a pre-existing Tavern Character Card.
- Updated token banning settings to include Phrase Banning (Anti-Slop).
- Minor fixes and tweaks
Merged fixes and improvements from upstream

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.76 koboldcpp-1.76 on GitHub

koboldcpp-1.76

LostRuins/koboldcpp v1.76
koboldcpp-1.76

on GitHub