github LostRuins/koboldcpp v1.110
koboldcpp-1.110

11 hours ago

koboldcpp-1.110

KoboldCpp 3 Year Anniversary Edition

PleadBoy.mp4
  • NEW: OpenAI Compatible Router Mode: - Automatic model and config hotswapping is finally available in the OpenAI Compatible API. Note that this functions differently from the llama.cpp version, it's more like llama-swap, allowing you to perform full config-reloads similar to the existing admin endpoint, but also within the existing request and response via a reverse proxy. Requires admin mode enabled. Enable it with --routermode. Streaming is supported with a small delay.
    • Model swapping now has an extra option "initial_model" which was the model that was originally loaded.
  • NEW: Auto Unload Timeout - Unloads the existing config (unloading all models) after a specified number of seconds. Works best with router mode to allow for auto reloading. Can also manually reload with admin endpoint.
  • NEW: Qwen3TTS now supports the 1.7B TTS model, with even better voice quality and voice cloning!
  • AceStep 1.5 Music Generation Improvements: Better quality, Reference Audio uploads are now supported, Mp3 outputs (ported from @ServeurpersoCom's MIT mp3 implementation), better LM defaults allowing audio-code generation to work better, make stereo output default. Recommended .kcppt template for 6GB users
  • Qwen3TTS loader improvements, supports --ttsgpu toggle, vulkan speed improvements for Qwen3TTS (cuda is still slow)
  • NEW: Improved Ollama Emulation - Can now handle requests from endpoints that only support streaming (buffers responses). However, OpenAI endpoint is still recommended if supported.
  • New: Multiple dynamic LoRAs: --sdlora now supports specifying directories as well. All the image LoRAs there will be loadable at runtime by using the LoRA syntax in your image generation prompt in the form <lora:filename:multiplier>. Also, merged multiple fixes and updates from upstream, include optional cache mode. Big thanks to @wbruna for the contributions.
  • NEW: Revamped Colab Cloud Notebook: The official KoboldCpp colab notebook has been updated and reworked. Music generation is now enabled, and image gen and text gen can now be used separately.
  • MCP improvements: Added notification support, now can handle simultaneous STDIO requests and request with multiple parts.
  • Adjustments to forcing --autofit, now disables moecpu and overridetensors automatically if used together.
  • Disable smartcache if slots is zero. Improved smartcache snapshot logic to use conserve slots.
  • Add warning that RNN models currently do not support anti-slop sampler.
  • Fixed some single token phrase bans not registering
  • OpenAI compatible endpoints now have dynamics IDs and reflect token usage accurately (thanks @gustrd)
  • Updated Kobold Lite, multiple fixes and improvements
  • Merged fixes, new model support, and improvements from upstream, including Nemotron support and Qwen3.5 improvements.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Don't miss a new koboldcpp release

NewReleases is sending notifications on new releases.