koboldcpp-1.31.2
This is mostly a bugfix build, with some new features to Lite.
- Better EOS token handling for Starcoder models.
- Major Kobold Lite update, including new scenarios, a variety of bug fixes, italics chat text, customized idle message counts, and improved sentence trimming behavior.
- Disabled RWKV sequence mode. Unfortunately, the speedups were too situational, and some users experienced speed regressions. Additionally, it was not compatible without modifying the ggml library to increase the max node counts, which had adverse impacts on other model architectures. Sequence mode will be disabled until it has been sufficiently improved upstream.
- Display token generation rate in console
Update 1.31.1:
- Cleaned up debug output, now only shows the server endpoint debugs if
--debugmode
is set. Also, no longer shows incoming horde prompts if--hordeconfig
is set unless--debugmode
is also enabled. - Fixed markdown in lite
Update 1.31.2:
- Allowed
--hordeconfig
to specify max context length allowed in horde too, which is separate from the real context length used to allocate memory.