koboldcpp-1.63

Enable Sound, Press Play

kobo_gif.mp4

Added support for special tokens in stop_sequences. Thus, if you set <|eot_id|> as a stop sequence and it can be tokenized into a single token, it will just work and function like the EOS token, allowing multiple EOS-like tokens.
Reworked the Automatic RoPE scaling calculations to support Llama3 (just specify the desired --contextsize and it will trigger automatically).
Added a console warning if another program is already using the desired port.
Improved server handling for bad or empty requests, which fixes a potential flooding vulnerability.
Fixed a scenario where the BOS token could get lost, potentially resulting in lower quality especially during context-shifting.
Pulled and merged new model support, improvements and fixes from upstream.
Updated Kobold Lite: Fixed markdown, reworked memory layout, added a regex replacer feature, added aesthetic background color settings, added more save slots, added usermod saving, added Llama3 prompt template

Edit: Something seems to be flagging the CI built binary on windows defender. Replaced it with a locally built one until I can figure it out.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.63 koboldcpp-1.63 on GitHub

koboldcpp-1.63

LostRuins/koboldcpp v1.63
koboldcpp-1.63

on GitHub