koboldcpp-1.113

intermission edition

Breaking change: Split mode is now set independently, in order to accommodate tensor split mode (experimental). This more closely follows the syntax used upstream with --split-mode
- Example Before: --usecuda rowsplit
- Example Now: --usecuda --splitmode (layer/row/tensor)
Allow a runtime image LoRA directory to be easily selected, will allow all image gen LoRAs inside to be loaded via the <lora:filename:weight> prompt syntax
Fixed q5_1 kv type not using the GPU correctly in CUDA
Handle some instances of duplicate think tags
Speed up Ace Step VAE decoding by increasing chunk size. If it doesn't fit, remember to use --musiclowvram
Allow customizing the multiuser queue limit via the GUI
Various minor updates and fixes for sd.cpp, thanks to @wbruna
Qwn3TTS speedups by switching back to original precision
Switched wav output from ulaw back to pcm16.
Avoid saving deprecated args in new kcpps files
Added --reqtimeout, which allows you to specify the desired timeouts for some requests like router mode.
Improved screen reader accessibility for the MusicUI
Fixed handling when ipv4 not available
Handle seed-oss think format
Fixed some config loading issues
Fixed a bug where mmap used much more memory that expected due to CPU repacking.
Fixed a issue when prefilling a turn in jinja mode, thanks @Reithan
Updated Kobold Lite, multiple fixes and improvements
Merged fixes, new model support, and improvements from upstream

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools
Newer rolling experimental builds can be found here, these are auto-updated and may be unstable.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

LostRuins/koboldcpp v1.113 koboldcpp-1.113 on GitHub

koboldcpp-1.113

LostRuins/koboldcpp v1.113
koboldcpp-1.113

on GitHub