koboldcpp-1.31.2

This is mostly a bugfix build, with some new features to Lite.

Better EOS token handling for Starcoder models.
Major Kobold Lite update, including new scenarios, a variety of bug fixes, italics chat text, customized idle message counts, and improved sentence trimming behavior.
Disabled RWKV sequence mode. Unfortunately, the speedups were too situational, and some users experienced speed regressions. Additionally, it was not compatible without modifying the ggml library to increase the max node counts, which had adverse impacts on other model architectures. Sequence mode will be disabled until it has been sufficiently improved upstream.
Display token generation rate in console

Update 1.31.1:

Cleaned up debug output, now only shows the server endpoint debugs if --debugmode is set. Also, no longer shows incoming horde prompts if --hordeconfig is set unless --debugmode is also enabled.
Fixed markdown in lite

Update 1.31.2:

Allowed --hordeconfig to specify max context length allowed in horde too, which is separate from the real context length used to allocate memory.

LostRuins/koboldcpp v1.31.2 koboldcpp-1.31.2 on GitHub