Upstream Changelog:
KoboldCpp is just a 'Dirty Fork' edition 😩
- KoboldCpp now natively supports Local Image Generation, thanks to the phenomenal work done by @leejet in stable-diffusion.cpp! It provides an A1111 compatible txt2img endpoint which you can use within the embedded Kobold Lite, or in many other compatible frontends such as SillyTavern.
- Just select a compatible SD1.5 or SDXL .safetensors fp16 model to load, either through the GUI launcher or with --sdconfig
- Enjoy zero install, portable, lightweight and hassle free image generation directly from KoboldCpp, without installing multi-GBs worth of ComfyUi, A1111, Fooocus or others.
- With just 8GB VRAM GPU, you can run both a 7B q4 GGUF (lowvram) alongside any SD1.5 image model at the same time, as a single instance, fully offloaded. If you run out of VRAM, select Compress Weights (quant) to quantize the image model to take less memory.
- KoboldCpp allows you to run in text-gen-only, image-gen-only or hybrid modes, simply set the appropriate launcher configs.
- Known to not work correctly in Vulkan (for now).
- When running from command line, --contextsize can now be set to any arbitrary number in range instead of locked to fixed values. However, using a non-recommended value may result in incoherent output depending on your settings. The GUI launcher for this remains unchanged.
- Added new quant types, pulled and merged improvements and fixes from upstream.
- Fixed some issues loading older GGUFv1 models, they should be working again.
- Added cloudflare tunnel support for macOS, (via --remotetunnel, however it probably won't work on M1, only amd64).
- Updated API docs and Colab for image gen.
- Updated Kobold Lite:
- Integrated support for AllTalk TTS
- Added "Auto Jailbreak" for instruct mode, useful to wrangle stubborn or censored models.
- Auto enable image gen button if KCPP loads image model
- Improved Autoscroll and layout, defaults to SSE streaming mode
- Added option to import and export story via clipboard
- Added option to set personal notes/comments in story
- Update v1.60.1: Port fix for CVE-2024-21836 for GGUFv1, enabled LCM sampler, allowed loading gguf SD models, fix SD for metal.
To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py
(additional python pip modules might need installed, like customtkinter and tk or python-tk.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4
(-j4 can be adjusted to your number of CPU threads for faster build times)
For a full Linux build, make sure you have the OpenBLAS and CLBlast packages installed:
For Arch Linux: Install cblas
openblas
and clblast
.
For Debian: Install libclblast-dev
and libopenblas-dev
.
then run make LLAMA_HIPBLAS=1 LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4
If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag.