koboldcpp-1.42.1

Added support for LLAMA GGUFv2 models, handled automatically. All older models will still continue to work normally.
Fixed a problem with certain logit values that were causing segfaults when using the Typical sampler. Please let me know if it happens again.
Merged rocm support from @YellowRoseCx so you should now be able to build AMD compatible GPU builds with HIPBLAS, which should be faster than using CLBlast.
Merged upstream support for GGUF Falcon models. Note that GPU layer offload for Falcon is unavailable with --useclblast but works with CUDA. Older pre-gguf Falcon models are not supported.
Added support for unbanning EOS tokens directly from API, and by extension it can now be triggered from Lite UI settings. Note: Your command line --unbantokens flag will force override this.
- Added support for automatic rope scale calculations based on a model's training context (n_ctx_train), this triggers if you do not explicitly specify a --ropeconfig. For example, this means llama2 models will (by default) use a smaller rope scale compared to llama1 models, for the same specified --contextsize. Setting --ropeconfig will override this. (reverted in 1.42.1 for now, it was not setup correctly)
Updated Kobold Lite, now with tavern style portraits in Aesthetic Instruct mode.
Pulled other fixes and improvements from upstream.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.42.1 koboldcpp-1.42.1 on GitHub

koboldcpp-1.42.1

LostRuins/koboldcpp v1.42.1
koboldcpp-1.42.1

on GitHub