koboldcpp-1.83.1

The Kobo ~~:.|:;~~ Edition

NEW: Added the ability to switch models, settings and configs at runtime! This also allows for remote model swapping. Credits to @esolithe for original reference implementation.
- Launch with --admin to enable this feature, and also provide --admindir containing .kcpps launch configs.
- Optionally, provide --adminpassword to secure admin functions
- You will be able to swap between any model's config at runtime from the Admin panel in Lite. You can prepare .kcpps configs for different layers, backends, models, etc.
- KoboldCpp will then terminate the current instance and relaunch to a new config.
Added a new backend option for CLBlast (Regular, OldCPU and OlderCPU), for avx2, avx and noavx respectively, ensuring a usable GPU alternatives for all ranges and ages of CPUs.
CLIP vision embeddings can now be reused between multiple requests, so they won't have to be reprocessed if the images don't change.
Context shifting disabled when using mrope (used in Qwen2VL) as it does not work correctly.
Now defaults to AutoGuess for chat completions adapter. Set to "Alpaca" for the old behavior instead.
You can now set the maximum resolution accepted by vision mmprojs with --visionmaxres. Images larger than that will be downscaled before processing.
You can now set a length limit for TTS, using --ttsmaxlen when launching, this limits the number of TTS tokens allowed to be generated (range 512 to 4096). Each 1s of audio is about 75 tokens.
Fixed a bug with TTS that could cause a crash.
Added cloudflared tunnel download for aarch64 (thanks @FlippFuzz). Also, allowed SSL combined with remote tunnels.
Updated Kobold Lite, multiple fixes and improvements
- NEW: Added deepseek instruct template, and added support for reasoning/thinking template tags. You can configure thinking rendering behavior from Context > Tokens > Thinking
- NEW: Finally allows specifying individual start and end instruct tags instead of combining them. Toggle this in Settings > Toggle End Tags.
- NEW: Multi-pass websearch added. This allows you to specify a template that is used to generate the search query.
- Added a websearch toggle button
- TTS now allows downloading the audio output as a file when testing it, instead of just playing the sound.
- Some regex parsing fixes
- Added admin panel
Merged fixes and improvements from upstream

Hotfix 1.83.1 - Fixed crashes in non-gguf models due to autoguess adapter. Also reverts to single process only when not in admin mode.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

LostRuins/koboldcpp v1.83.1 koboldcpp-1.83.1 on GitHub

koboldcpp-1.83.1

LostRuins/koboldcpp v1.83.1
koboldcpp-1.83.1

on GitHub