koboldcpp-1.91

Entering search mode edition

NEW: Huggingface Model Search Tool - Grabbing a model has never been easier! KoboldCpp now comes with a HF model browser so you can search and find the GGUF models you like directly from huggingface. Simply search for and select the model, and it will be downloaded before launch.
Embedded aria2c downloader for windows builds - this provides extremely fast downloads and is automatically used when downloading models via provided URLs.
Added CUDA target for compute capability 3.5. This may allow KoboldCpp to be used with K6000, GTX 780, K80. I have received some success stories - if you do try, share your experiences on the discussions page!
Reduced CUDA binary sizes by switching most cuda cc targets to virtual, thanks to a good suggestion from Johannes at ggml-org#13135
Improved ComfyUI emulation, can now adapt to any kind of workflow so long as there is a KSampler node connected to a text prompt somewhere in it.
Fixed GLM-4 prompt handling even for quants with incorrect BOS set.
Added support for Classifier-Free Guidance (CFG) since I wanted to mess with it. At long last I have finally added CFG, but I don't really like it - results are not great. Anyway, if you wish to use it simple check Enable Guidance or use --enableguidance, then set a negative prompt and CFG scale from the lite tokens menu. Note that guidance doubles KV usage and halves generation speed. Overall, it was a disappointing addition and not really worth the effort.
StableUI now clears the queue when cancelling a generation
Further fixes for Zenity/YAD in multilingual environments
Removed flash attention limits and warnings for Vulkan
Updated Kobold Lite, multiple fixes and improvements
- Important Change: KoboldCppAuto is now the default instruct preset. This will let the KoboldCpp backend automatically choose the correct instruct tags to use at runtime, based on the model loaded. This is done transparently in the backend and not visible to the user. If it doesn't work properly, you can always still switch to your preferred instruct format (e.g. Alpaca).
- NEW: Corpo mode now supports Text mode and Adventure mode as well, making it usable in all 4 modes.
- Added quick save and delete buttons for corpo mode.
- Added Pollinations.ai as an option for TTS and Image Gen (optional online service)
- Instruct placeholders are now always used (but you can change what they map to, including themselves)
- Added confirmation box for loading from slots
- Improved think tag handling and output formatting.
- Added a new scenario: Nemesis
- Chat match any name is no longer on by default
- Fixed autoscroll jumping on edit in corpo mode
- Fix char spec v2 embedded WI import by @Cohee1207
Merged fixes and improvements from upstream

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

LostRuins/koboldcpp v1.91 koboldcpp-1.91 on GitHub

koboldcpp-1.91

LostRuins/koboldcpp v1.91
koboldcpp-1.91

on GitHub