koboldcpp-1.80

End of the year edition

NEW: Added support for image Multimodal with Qwen2-VL! You can grab the quantized mmproj here for the 2B and 7B models, and then grab the 2B or 7B Instruct models from Bartowski.
- Note: Qwen2-VL vision is not working on Vulkan currently. The model will load and generate text fine, but it's unable to recognize anything. Works fine on CUDA and CPU. Follow ggerganov#10843
- For a quick start, here's a working template you can use
NEW: Vulkan now has coopmat1 support, making it significantly faster on modern Nvidia cards (credits @0cc4m)
Added a few new QoL flags:
- --moeexperts - Overwrite the number of experts to use in MoE models
- --failsafe - A proper way to set failsafe mode, which disables all CPU intrinsics and GPU usage.
- --draftgpulayers - Set number of layers to offload for speculative decoding draft model
- --draftgpusplit - GPU layer distribution ratio for draft model (default=same as main). Only works if using multi-GPUs.
Fixes for buggy tkinter GUI launcher window in Linux (thanks @henk717)
Restored support for ARM quants in Kobold (e.g. Q4_0_4_4), but you should consider switching to q4_0 eventually.
Fixed a bug that caused context corruption when aborting a generation while halfway processing a prompt
Added new field suppress_non_speech to Whisper allowing banning "noise annotation" logits (e.g. Barking, Doorbell, Chime, Muzak)
Improved compile flags on ARM, self-compiled builds now use correct native flags and should be significantly faster (tested on Pi and Termux). Simply run make for native ARM builds, or make LLAMA_PORTABLE=1 for a slower portable build.
trim_stop now defaults to true (output will no longer contain stop sequence by default)
Debugmode shows drafted tokens and allow incompatibles vocab for speculative decoding when enabled (not recommended)
Handle more generation parameters in ollama API emulation
Handle pyinstaller temp paths for chat adapters when saving a kcpps config file
Default image gen sampler set to Euler
MMQ is now the default for CLI as well. Use nommq flag to disable (e.g. --usecublas all nommq). Old flags still work.
Upgrade build to use C++17
Always use PCI Bus ID order for CUDA GPU listing consistency (match nvidia-smi)
Updated Kobold Lite, multiple fixes and improvements
- NEW: Added LaTeX rendering together with markdown. Uses standard \[...\] $...$ and $$...$$ syntax.
- You can now manually upload an audio file to transcribe in settings.
- Better regex to trigger image generation
- Aesthetic UI fixes
- Added q as an alias to query for direct URL querying (e.g. http://localhost:5001?q=what+is+love)
- Added support for AllTalk v2 API. AllTalk v1 is still supported automatically (credits @erew123)
- Added support for Mantella XTTS (XTTS fork)
- Toggle to disable "non-speech" whisper output (see above)
- Consolidated Instruct templates (Mistral V3 merged to V7)
Merged fixes and improvements from upstream

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

LostRuins/koboldcpp v1.80 koboldcpp-1.80 on GitHub

koboldcpp-1.80

LostRuins/koboldcpp v1.80
koboldcpp-1.80

on GitHub