koboldcpp-1.43

Re-added support for automatic rope scale calculations based on a model's training context (n_ctx_train), this triggers if you do not explicitly specify a --ropeconfig. For example, this means llama2 models will (by default) use a smaller rope scale compared to llama1 models, for the same specified --contextsize. Setting --ropeconfig will override this. This was bugged and removed in the previous release, but it should be working fine now.
HIP and CUDA visible devices set to that GPU only, if GPU number is provided and tensor split is not specified.
Fixed RWKV models being broken after recent upgrades.
Tweaked --unbantokens to decrease the banned token logit values further, as very rarely they could still appear. Still not using -inf as that causes issues with typical sampling.
Integrate SSE streaming improvements from @kalomaze
Added mutex for thread-safe polled-streaming from @Elbios
Added support for older GGML (ggjt_v3) for 34B llama2 models by @vxiiduu, note that this may still have issues if n_gqa is not 1, in which case using GGUF would be better.
Fixed support for Windows 7, which should work in noavx2 and failsafe modes again. Also, SSE3 flags are now enabled for failsafe mode.
Updated Kobold Lite, now uses placeholders for instruct tags that get swapped during generation.
Tab navigation order improved in GUI launcher, though some elements like checkboxes still require mouse to toggle.
Pulled other fixes and improvements from upstream.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

Of Note:

Reminder that HIPBLAS requires self compilation, and is not included by default in the prebuilt executables.
Remember that token unbans can now be set via API (and Lite) in addition to the command line.

LostRuins/koboldcpp v1.43 koboldcpp-1.43 on GitHub

koboldcpp-1.43

LostRuins/koboldcpp v1.43
koboldcpp-1.43

on GitHub