koboldcpp-1.105

new year edition

NEW: Added --gendefaults, accepts a JSON dictionary where you can specify any API fields to append or overwrite (e.g. step count, temperature, top_k) on incoming payloads. Incoming API payloads will have this modification applied. This can be useful when using frontends that don't behave well, as you will be able to override or correct whatever fields they send to koboldcpp.
- Note: If this marks the horde worker with a debug flag if used on AI Horde.
- --sdgendefaults has been deprecated and merged into this flag
Added support for a new "Adaptive-P" sampler by @MrJackSpade, a sampler that allows selecting lower probability tokens. Recommended to use together with min-P. Configure with adaptive target and adaptive decay parameters. This sampler may be subject to change in future.
StableUI SDUI: Fixed generation queue stacking, allowed requesting AVI formatted videos (enable in settings first), added a dismiss button, various small tweaks
Minor fixes to tool calling
Added support for Ovis Image and new Qwen Image Edit, added support for TAEHV for WAN VAE (you can use it with Wan2.2 videos and Qwen Image/Qwen Image Edit, simply enable "TAE SD" checkbox or --sdvaeauto, greatly saves memory), thanks @wbruna for the sync.
Fixed LoRA loading issues with some Qwen Image LoRAs
--autofit now allocates some extra space if used with multiple models (image gen, embeddings etc)
Improved snapshotting logic with --smartcache for RNN models.
Attempted to fix tk scaling on some systems.
Renamed KCPP launcher's Tokens tab to Context, moved Flash Attention toggle into hardware tab
Updated Kobold Lite, multiple fixes and improvements
- Added support for using remote http MCP servers for tool calling. KoboldCpp based MCP may be added at a later date.
Merged fixes, model support, and improvements from upstream

Important Notice: The CLBlast backend may be removed soon, as it is very outdated and no longer receives and updates, fixes or improvements. It can be considered superceded by the Vulkan backend. If you have concerns, please join the discussion here.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build for best support.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

LostRuins/koboldcpp v1.105 koboldcpp-1.105 on GitHub

koboldcpp-1.105

LostRuins/koboldcpp v1.105
koboldcpp-1.105

on GitHub