koboldcpp-1.60.1

KoboldCpp is just a 'Dirty Fork' edition

KoboldCpp now natively supports Local Image Generation, thanks to the phenomenal work done by @leejet in stable-diffusion.cpp! It provides an A1111 compatible txt2img endpoint which you can use within the embedded Kobold Lite, or in many other compatible frontends such as SillyTavern.
- Just select a compatible SD1.5 or SDXL .safetensors fp16 model to load, either through the GUI launcher or with --sdconfig
- Enjoy zero install, portable, lightweight and hassle free image generation directly from KoboldCpp, without installing multi-GBs worth of ComfyUi, A1111, Fooocus or others.
- With just 8GB VRAM GPU, you can run both a 7B q4 GGUF (lowvram) alongside any SD1.5 image model at the same time, as a single instance, fully offloaded. If you run out of VRAM, select Compress Weights (quant) to quantize the image model to take less memory.
- KoboldCpp allows you to run in text-gen-only, image-gen-only or hybrid modes, simply set the appropriate launcher configs.
- Known to not work correctly in Vulkan (for now).
When running from command line, --contextsize can now be set to any arbitrary number in range instead of locked to fixed values. However, using a non-recommended value may result in incoherent output depending on your settings. The GUI launcher for this remains unchanged.
Added new quant types, pulled and merged improvements and fixes from upstream.
Fixed some issues loading older GGUFv1 models, they should be working again.
Added cloudflare tunnel support for macOS, (via --remotetunnel, however it probably won't work on M1, only amd64).
Updated API docs and Colab for image gen.
Updated Kobold Lite:
- Integrated support for AllTalk TTS
- Added "Auto Jailbreak" for instruct mode, useful to wrangle stubborn or censored models.
- Auto enable image gen button if KCPP loads image model
- Improved Autoscroll and layout, defaults to SSE streaming mode
- Added option to import and export story via clipboard
- Added option to set personal notes/comments in story

Update v1.60.1: Port fix for CVE-2024-21836 for GGUFv1, enabled LCM sampler, allowed loading gguf SD models, fix SD for metal.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.60.1 koboldcpp-1.60.1 on GitHub

koboldcpp-1.60.1

LostRuins/koboldcpp v1.60.1
koboldcpp-1.60.1

on GitHub