LostRuins/koboldcpp koboldcpp-1.22-CUDA-ONLY on GitHub

koboldcpp-1.22-CUDA-ONLY (Special Edition)

A.K.A The "Look what you made me do" edition.

Changes:

This is a (one-off?) limited edition CUDA only build.
Only NVIDIA GPUs will work for this.
This build does not support CLblast or OpenBLAS. Selecting OpenBLAS or CLBlast options still loads CUBLAS.
This build does not support running old quantization formats (this is a limitation of the upstream CUDA kernel).
This build DOES support GPU Offloading via CUBLAS. To use that feature, select number of layers to offload e.g. --gpulayers 32
This build is very huge because of the CUBLAS libraries bundled with it. It requires CUDA Runtime support for 11.8 and up.

For those who want the previous version, please find v1.21.3 here: https://github.com/LostRuins/koboldcpp/releases/tag/v1.21.3

To use, download and run the koboldcpp_CUDA_only.exe, which is a one-file pyinstaller.
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.
This release also includes a zip file containing the libraries and the koboldcpp.py script, for those who prefer not use to the one-file pyinstaller.

LostRuins/koboldcpp koboldcpp-1.22-CUDA-ONLY **koboldcpp-1.22-CUDA-ONLY** on GitHub

koboldcpp-1.22-CUDA-ONLY (Special Edition)

LostRuins/koboldcpp koboldcpp-1.22-CUDA-ONLY
koboldcpp-1.22-CUDA-ONLY

on GitHub