koboldcpp-1.106
MCP for the masses edition
- NEW: MCP Server and Client Support Added to KoboldCpp - KoboldCpp now supports running an MCP bridge that serves as a direct drop-in replacement for Claude Desktop.
- KoboldCpp can connect to any HTTP or STDIO MCP server, using a
mcp.jsonconfig format compatible with Claude Desktop. - Multiple servers are supported, KoboldCpp will automatically combine their tools and dispatch request appropriately.
- Recommended guide for MCP newbies: Here is a simple guide on running a Filesystem MCP Server to let your AI browse files locally on your PC and search the web - https://github.com/LostRuins/koboldcpp/wiki#mcp-tool-calling
- CAUTION: Running ANY MCP SERVER gives it full access to your system. Their 3rd party scripts will be able to modify and make changes to your files. Be sure to only run servers you trust!
- The example music playing MCP server used in the screenshot above was this audio-player-mcp
- KoboldCpp can connect to any HTTP or STDIO MCP server, using a
- Flash Attention is now enabled by default when using the GUI launcher.
- Improvements to tool parsing (thanks @AdamJ8)
- API field
continue_assistant_turnis now enabled by default in all chat completions (assistant prefill) - Interrogate image max length increased
- Various StableUI fixes by @Riztard
- Using the environment variable
GGML_VK_VISIBLE_DEVICESexternally now always overrides whatever vulkan device settings set from KoboldCpp. - Updated Kobold Lite, multiple fixes and improvements
- Merged fixes, model support, and improvements from upstream
Important Notice: The CLBlast backend may be removed soon, as it is very outdated and no longer receives and updates, fixes or improvements. It can be considered superceded by the Vulkan backend. If you have concerns, please join the discussion here.
Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build for best support.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.