koboldcpp-1.74
Kobo's all grown up now
- NEW: Added XTC (Exclude Top Choices) sampler, a brand new creative writing sampler designed by the same author of DRY (@p-e-w). To use it, increase
xtc_probability
above 0 (recommended values to try:xtc_threshold=0.15, xtc_probability=0.5
) - Added automatic image resizing and letterboxing for llava/minicpm images, this should improve handling of oddly-sized images.
- Added a new flag
--nomodel
which allows launching the Lite WebUI without loading any model at all. You can then select an external api provider like Horde, Gemini or OpenAI - MacOS defaults to full offload when
-1
gpulayers selected - Minor tweaks to context shifting thresholds
- Horde Worker now has a 5 minute timeout for each request, which should reduce the likelihood of getting stuck (e.g. internet issues). Also, horde worker now supports connecting to SSL secured Kcpp instances (remember to enable
--nocertify
if using self signed certs) - Updated Kobold Lite, multiple fixes and improvements
- Merged fixes and improvements from upstream (plus Llama-3.1-Minitron-4B-Width support)
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.