koboldcpp-1.66.1

Phi guess that's the way the cookie crumbles edition

NEW: Added custom SD LoRA support! Specify it with --sdlora and set the LoRA multiplier with --sdloramult. Note that SD LoRAs can only be used when loading in 16bit (e.g. with the .safetensors model) and will not work on quantized models (so incompatible with --sdquant)
NEW: Added custom SD VAE support, which can be specified in the Image Gen tab of the GUI launcher, or using --sdvae [vae_file.safetensors]
NEW: Added in-built support for TAE SD for SD1.5 and SDXL. This is a very small VAE replacement that can be used if a model has a broken VAE, it also works faster than regular VAE. To use it, select "Fix Bad VAE" checkbox or use the flag --sdvaeauto
- Note: Do not use the above new flags with --sdconfig, which is a deprecated flag and not to be used.
NEW: Added experimental support for Rep Pen Slope. This is not a true slope, but the end result is it applies a slightly reduced rep pen for older tokens within the rep pen range, scaled by the slope value. Setting rep pen slope to 1 negates this effect. For compatibility reasons, rep pen slope defaults to 1 if unspecified (same behavior as before).
NEW: You can now specify a http/https URL to a GGUF file when passing the --model parameter, or in the model selector UI. KoboldCpp will attempt to download the model file into your current working directory, and automatically load it when the download is done.
Disable UI launcher scaling on MacOS due to display issues. Please report any further scaling issues.
Improved EOT token handling, fixed a bug in token speed calculations.
Default thread count will not exceed 8 unless overridden, this helps mitigate e-core issues.
Merged improvements and fixes from upstream, including new Phi support and Vulkan fixes from @0cc4m
Updated Kobold Lite:
- Now attempts to function correctly if hosted on a subdirectory URL path (e.g. using a reverse proxy), if that fails it defaults back to the root URL.
- Changed default chatmode player name from "You" to "User", which solves some wonky phrasing issues.
- Added viewport width controls in settings, including horizontal fullscreen.
- Minor bugfixes for markdown

Fix for 1.66.1 - Fixed quant tools makefile, fixed sd seed parsing, updated lite

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.66 koboldcpp-1.66.1 on GitHub

koboldcpp-1.66.1

LostRuins/koboldcpp v1.66
koboldcpp-1.66.1

on GitHub