koboldcpp-1.34.2

This is a BIG update. Changes:

Added a brand new customtkinter GUI which contains many more configurable settings. To use this new UI, the python module customtkinter is required for Linux and OSX (already included with windows .exe builds). The old GUI is still available otherwise. (Thanks: @Vali-98)
Switched to NTK aware scaling for RoPE, set based on --contextsize parameter, with support up to 8K context. This seems to perform much better than the previous dynamic linear method, even on untuned models. It still won't work perfectly for SuperHOT 8K, as that model requires a fixed 0.25 linear rope scale, but I think this approach is better in general. Note that the alpha value chosen is applied when you select the --contextsize so for best results, only set a big --contextsize if you need it since there will be minor perplexity loss otherwise.
Enabled support for NTK-Aware scaled RoPE to GPT-NeoX and GPT-J too! And surprisingly long context does work decently with older models, so you can enjoy something like Pyg6B or Pythia with 4K context if you like.
Added /generate API support for sampler_order and mirostat/tau/eta parameters, which you can now set per-generation. (Thanks: @ycros)
Added --bantokens which allows you to specify a list of token substrings that the AI cannot use. For example --bantokens [ a ooo prevents the AI from using any left square brackets, the letter a, or any token containing ooo. This bans all instances of matching tokens!
Added more granular context size options, now you can select 3k and 6k context sizes as well.
Added the ability to select Main GPU to use when using CUDA. For example, --usecublas lowvram 2 will use the third Nvidia GPU if it exists.
Pulled updates from RWKV.cpp, minor speedup for prompt processing.
Fixed build issues on certain older and OSX platforms, GCC 7 should now be supported. please report any that you find.
Pulled fixes and updates from upstream, Updated Kobold Lite. Kobold Lite now allows you to view submitted contexts after each generation. Also includes two new scenarios and limited support for Tavern v2 cards.
Adjusted scratch buffer sizes for big contexts, so unexpected segfaults/OOM errors should be less common (please report any you find). CUDA scratch buffers should also work better now (upstream fix).

1.34.1a Hotfix (CUDA): Cuda was completely broken, did a quick revert to get it working. Will upload a proper build later.
1.32.2 Hotfix: CUDA kernels now updated to latest version, used python to handle the GPU selection instead.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.34.2 koboldcpp-1.34.2 on GitHub

koboldcpp-1.34.2

LostRuins/koboldcpp v1.34.2
koboldcpp-1.34.2

on GitHub