koboldcpp-1.66.1
Phi guess that's the way the cookie crumbles edition
- NEW: Added custom SD LoRA support! Specify it with
--sdlora
and set the LoRA multiplier with--sdloramult
. Note that SD LoRAs can only be used when loading in 16bit (e.g. with the.safetensors
model) and will not work on quantized models (so incompatible with--sdquant
) - NEW: Added custom SD VAE support, which can be specified in the Image Gen tab of the GUI launcher, or using
--sdvae [vae_file.safetensors]
- NEW: Added in-built support for TAE SD for SD1.5 and SDXL. This is a very small VAE replacement that can be used if a model has a broken VAE, it also works faster than regular VAE. To use it, select "Fix Bad VAE" checkbox or use the flag
--sdvaeauto
- Note: Do not use the above new flags with
--sdconfig
, which is a deprecated flag and not to be used.
- Note: Do not use the above new flags with
- NEW: Added experimental support for Rep Pen Slope. This is not a true slope, but the end result is it applies a slightly reduced rep pen for older tokens within the rep pen range, scaled by the slope value. Setting rep pen slope to 1 negates this effect. For compatibility reasons, rep pen slope defaults to 1 if unspecified (same behavior as before).
- NEW: You can now specify a http/https URL to a GGUF file when passing the
--model
parameter, or in the model selector UI. KoboldCpp will attempt to download the model file into your current working directory, and automatically load it when the download is done. - Disable UI launcher scaling on MacOS due to display issues. Please report any further scaling issues.
- Improved EOT token handling, fixed a bug in token speed calculations.
- Default thread count will not exceed 8 unless overridden, this helps mitigate e-core issues.
- Merged improvements and fixes from upstream, including new Phi support and Vulkan fixes from @0cc4m
- Updated Kobold Lite:
- Now attempts to function correctly if hosted on a subdirectory URL path (e.g. using a reverse proxy), if that fails it defaults back to the root URL.
- Changed default chatmode player name from "You" to "User", which solves some wonky phrasing issues.
- Added viewport width controls in settings, including horizontal fullscreen.
- Minor bugfixes for markdown
Fix for 1.66.1 - Fixed quant tools makefile, fixed sd seed parsing, updated lite
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.