koboldcpp-1.108
- Try to fix broken pipe errors due to timeouts during long tool calls
- Updated SDUI, added toggle to send img2img as a reference.
- Added ollama
/api/showendpoint emulation - Try to fix autofit on rocm going oom
- Improved MCP behavior with multipart content
- Prevent swapping config at runtime from changing the download directory
- Adjust GUI for fractional scaling
- Fix output filenames incorrect path in some cases
- llama.cpp UI handling of common think tags.
--autofitmode now hides the GUI layers selector- Fixed extra spam from autofit mode
- Autofit toggle is now in the Quick Launch menu
- Autofit is now triggered if
-1gpulayers (default) is selected and tensor splits or tensor overrides are not set. Setting your own GPU layers overrides this behavior - Now allow Image Gen soft limit to be overridden to 2048x2048 if user chooses. Note that this may crash if you don't know what you're doing.
- Updated upstream stable-diffusion.cpp by @wbruna
- Updated Kobold Lite, multiple fixes and improvements
- Merged fixes, new model support, and improvements from upstream
Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.