github LostRuins/koboldcpp v1.51.1
koboldcpp-1.51.1

latest releases: v1.75.2, v1.75.1, v1.75...
9 months ago

koboldcpp-1.51.1

all quiet on the kobold front edition

  • Added a new flag --quiet which allows you to suppress input and outputs from appearing in the console.
  • When context shift is enabled, allocate a small amount (about 80 tokens) of reserved space to reduce the Failed to predict errors that occur due to running out of KV cache space caused by KV cache fragmentation when shifting.
  • Auto rope scaling will not be automatically applied if the model already overrides the RoPE freq scale with a value below 1.
  • Increased the graph node limit for older models to fix AiDungeon GPT2 not working.
  • Display the available endpoint KAI and OAI URLs in the terminal on startup.
  • Updated some API examples in the documentation
  • --multiuser now accepts an extra optional parameter that indicates how many concurrent requests to allow to queue. If unset, or set to 1, it defaults to the default value of 5.
  • Pulled fixed and improvements from upstream, updated Kobold Lite, fixed Chub imports, optimized for Firefox, added multiline input in aesthetic mode, various other improvements.

1.51.1 Hotfix:

  • Reverted an upstream change that caused a CLBlast segfault that occurred when context size exceeded 2k.
  • Stripped out the OAI SSE carriage return after end message that was causing issues in Janitor.
  • Moved the 80 extra tokens allocated for handling KV fragmentation to be added on top of the specified max context length instead of subtracted from it at runtime, which could cause padding issues when counting tokens in Tavern. This means that loading --contextsize 2048 will actually allocate a size of 2128 behind the scenes for example.
  • Changed the API url printouts to include the tunnel url when using --remotetunnel

Added a linux test build provided by @henk717

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

Don't miss a new koboldcpp release

NewReleases is sending notifications on new releases.