github LostRuins/koboldcpp v1.83.1
koboldcpp-1.83.1

one day ago

koboldcpp-1.83.1

The Kobo :.|:; Edition

is_this_loss

  • NEW: Added the ability to switch models, settings and configs at runtime! This also allows for remote model swapping. Credits to @esolithe for original reference implementation.
    • Launch with --admin to enable this feature, and also provide --admindir containing .kcpps launch configs.
    • Optionally, provide --adminpassword to secure admin functions
    • You will be able to swap between any model's config at runtime from the Admin panel in Lite. You can prepare .kcpps configs for different layers, backends, models, etc.
    • KoboldCpp will then terminate the current instance and relaunch to a new config.
  • Added a new backend option for CLBlast (Regular, OldCPU and OlderCPU), for avx2, avx and noavx respectively, ensuring a usable GPU alternatives for all ranges and ages of CPUs.
  • CLIP vision embeddings can now be reused between multiple requests, so they won't have to be reprocessed if the images don't change.
  • Context shifting disabled when using mrope (used in Qwen2VL) as it does not work correctly.
  • Now defaults to AutoGuess for chat completions adapter. Set to "Alpaca" for the old behavior instead.
  • You can now set the maximum resolution accepted by vision mmprojs with --visionmaxres. Images larger than that will be downscaled before processing.
  • You can now set a length limit for TTS, using --ttsmaxlen when launching, this limits the number of TTS tokens allowed to be generated (range 512 to 4096). Each 1s of audio is about 75 tokens.
  • Fixed a bug with TTS that could cause a crash.
  • Added cloudflared tunnel download for aarch64 (thanks @FlippFuzz). Also, allowed SSL combined with remote tunnels.
  • Updated Kobold Lite, multiple fixes and improvements
    • NEW: Added deepseek instruct template, and added support for reasoning/thinking template tags. You can configure thinking rendering behavior from Context > Tokens > Thinking
    • NEW: Finally allows specifying individual start and end instruct tags instead of combining them. Toggle this in Settings > Toggle End Tags.
    • NEW: Multi-pass websearch added. This allows you to specify a template that is used to generate the search query.
    • Added a websearch toggle button
    • TTS now allows downloading the audio output as a file when testing it, instead of just playing the sound.
    • Some regex parsing fixes
    • Added admin panel
  • Merged fixes and improvements from upstream

Hotfix 1.83.1 - Fixed crashes in non-gguf models due to autoguess adapter. Also reverts to single process only when not in admin mode.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Don't miss a new koboldcpp release

NewReleases is sending notifications on new releases.