koboldcpp-1.42.1
- Added support for LLAMA GGUFv2 models, handled automatically. All older models will still continue to work normally.
- Fixed a problem with certain logit values that were causing segfaults when using the Typical sampler. Please let me know if it happens again.
- Merged rocm support from @YellowRoseCx so you should now be able to build AMD compatible GPU builds with HIPBLAS, which should be faster than using CLBlast.
- Merged upstream support for GGUF Falcon models. Note that GPU layer offload for Falcon is unavailable with
--useclblast
but works with CUDA. Older pre-gguf Falcon models are not supported. - Added support for unbanning EOS tokens directly from API, and by extension it can now be triggered from Lite UI settings. Note: Your command line
--unbantokens
flag will force override this.
- Added support for automatic rope scale calculations based on a model's training context (n_ctx_train), this triggers if you do not explicitly specify a(reverted in 1.42.1 for now, it was not setup correctly)--ropeconfig
. For example, this means llama2 models will (by default) use a smaller rope scale compared to llama1 models, for the same specified--contextsize
. Setting--ropeconfig
will override this. - Updated Kobold Lite, now with tavern style portraits in Aesthetic Instruct mode.
- Pulled other fixes and improvements from upstream.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.