github LostRuins/koboldcpp v1.43
koboldcpp-1.43

latest releases: v1.75.2, v1.75.1, v1.75...
13 months ago

koboldcpp-1.43

  • Re-added support for automatic rope scale calculations based on a model's training context (n_ctx_train), this triggers if you do not explicitly specify a --ropeconfig. For example, this means llama2 models will (by default) use a smaller rope scale compared to llama1 models, for the same specified --contextsize. Setting --ropeconfig will override this. This was bugged and removed in the previous release, but it should be working fine now.
  • HIP and CUDA visible devices set to that GPU only, if GPU number is provided and tensor split is not specified.
  • Fixed RWKV models being broken after recent upgrades.
  • Tweaked --unbantokens to decrease the banned token logit values further, as very rarely they could still appear. Still not using -inf as that causes issues with typical sampling.
  • Integrate SSE streaming improvements from @kalomaze
  • Added mutex for thread-safe polled-streaming from @Elbios
  • Added support for older GGML (ggjt_v3) for 34B llama2 models by @vxiiduu, note that this may still have issues if n_gqa is not 1, in which case using GGUF would be better.
  • Fixed support for Windows 7, which should work in noavx2 and failsafe modes again. Also, SSE3 flags are now enabled for failsafe mode.
  • Updated Kobold Lite, now uses placeholders for instruct tags that get swapped during generation.
  • Tab navigation order improved in GUI launcher, though some elements like checkboxes still require mouse to toggle.
  • Pulled other fixes and improvements from upstream.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

Of Note:

  • Reminder that HIPBLAS requires self compilation, and is not included by default in the prebuilt executables.
  • Remember that token unbans can now be set via API (and Lite) in addition to the command line.

Don't miss a new koboldcpp release

NewReleases is sending notifications on new releases.