koboldcpp-1.83.1
The Kobo :.|:; Edition
- NEW: Added the ability to switch models, settings and configs at runtime! This also allows for remote model swapping. Credits to @esolithe for original reference implementation.
- Launch with
--admin
to enable this feature, and also provide--admindir
containing.kcpps
launch configs. - Optionally, provide
--adminpassword
to secure admin functions - You will be able to swap between any model's config at runtime from the Admin panel in Lite. You can prepare .kcpps configs for different layers, backends, models, etc.
- KoboldCpp will then terminate the current instance and relaunch to a new config.
- Launch with
- Added a new backend option for CLBlast (Regular, OldCPU and OlderCPU), for avx2, avx and noavx respectively, ensuring a usable GPU alternatives for all ranges and ages of CPUs.
- CLIP vision embeddings can now be reused between multiple requests, so they won't have to be reprocessed if the images don't change.
- Context shifting disabled when using mrope (used in Qwen2VL) as it does not work correctly.
- Now defaults to AutoGuess for chat completions adapter. Set to "Alpaca" for the old behavior instead.
- You can now set the maximum resolution accepted by vision mmprojs with
--visionmaxres
. Images larger than that will be downscaled before processing. - You can now set a length limit for TTS, using
--ttsmaxlen
when launching, this limits the number of TTS tokens allowed to be generated (range 512 to 4096). Each 1s of audio is about 75 tokens. - Fixed a bug with TTS that could cause a crash.
- Added cloudflared tunnel download for aarch64 (thanks @FlippFuzz). Also, allowed SSL combined with remote tunnels.
- Updated Kobold Lite, multiple fixes and improvements
- NEW: Added deepseek instruct template, and added support for reasoning/thinking template tags. You can configure thinking rendering behavior from Context > Tokens > Thinking
- NEW: Finally allows specifying individual start and end instruct tags instead of combining them. Toggle this in Settings > Toggle End Tags.
- NEW: Multi-pass websearch added. This allows you to specify a template that is used to generate the search query.
- Added a websearch toggle button
- TTS now allows downloading the audio output as a file when testing it, instead of just playing the sound.
- Some regex parsing fixes
- Added admin panel
- Merged fixes and improvements from upstream
Hotfix 1.83.1 - Fixed crashes in non-gguf models due to autoguess adapter. Also reverts to single process only when not in admin mode.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag. You can also refer to the readme and the wiki.