koboldcpp-1.113
intermission edition
- Breaking change: Split mode is now set independently, in order to accommodate tensor split mode (experimental). This more closely follows the syntax used upstream with
--split-mode- Example Before:
--usecuda rowsplit - Example Now:
--usecuda --splitmode (layer/row/tensor)
- Example Before:
- Allow a runtime image LoRA directory to be easily selected, will allow all image gen LoRAs inside to be loaded via the
<lora:filename:weight>prompt syntax - Fixed q5_1 kv type not using the GPU correctly in CUDA
- Handle some instances of duplicate think tags
- Speed up Ace Step VAE decoding by increasing chunk size. If it doesn't fit, remember to use
--musiclowvram - Allow customizing the multiuser queue limit via the GUI
- Various minor updates and fixes for sd.cpp, thanks to @wbruna
- Qwn3TTS speedups by switching back to original precision
- Switched wav output from ulaw back to pcm16.
- Avoid saving deprecated args in new kcpps files
- Added
--reqtimeout, which allows you to specify the desired timeouts for some requests like router mode. - Improved screen reader accessibility for the MusicUI
- Fixed handling when ipv4 not available
- Handle seed-oss think format
- Fixed some config loading issues
- Fixed a bug where mmap used much more memory that expected due to CPU repacking.
- Fixed a issue when prefilling a turn in jinja mode, thanks @Reithan
- Updated Kobold Lite, multiple fixes and improvements
- Merged fixes, new model support, and improvements from upstream
Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools
Newer rolling experimental builds can be found here, these are auto-updated and may be unstable.
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.