koboldcpp-1.79
One Kobo To Rule Them All Edition
- NEW: Add Multiplayer Support: You can now enable Multiplayer mode on your KoboldCpp instances! Enable it with the
--multiplayer
flag or in the GUI launcher Network tab. Then, connect to your browser, enter KoboldAI Lite and click the "Join Multiplayer" button.- Multiplayer allows multiple users to view and edit a KoboldAI Lite session, live at the same time! You can take turns to chat with the AI together, host a shared adventure or collaborate on a shared story, which is automatically synced between all participants.
- Multiplayer mode also allows you an easy way to sync a story/session with multiple of your devices over the network. You can treat it like a temporary online save file.
- To prevent conflicts when two users edit text simultaneously, observe the
(Idle)
or(Busy)
indicator at the top right corner. - Multiplayer utilizes the new endpoints
/api/extra/multiplayer/status
,/api/extra/multiplayer/getstory
and/api/extra/multiplayer/setstory
, however these only are intended for internal use in Kobold Lite and not for third-party integration.
- NEW: Added Ollama API Emulation: Adds Ollama compatible endpoints
/api/chat
and/api/generate
which provide basic Ollama API emulation. Streaming is not supported. This will allow you to use KoboldCpp to try out amateur 3rd party tools that only support the Ollama API. Simply point that tool to KoboldCpp (at http://localhost:5001 by default, but you may also need to run KoboldCpp on port 11434 for some exceptionally poorly written tools) and connect normally. If the tool you want to use supports OpenAI API, you're strongly encouraged to use that instead. Here's a sample tool to verify it works. All other KoboldCpp endpoints remain functional and all of them can run at the same time. - NEW: Added ComfyUI Emulation: Likewise, add a new endpoint at
/prompt
emulates a ComfyUI backend, allowing you to use tools that require ComfyUI API, but lack A1111 API support. Right now only txt2img is supported. - NEW: Speculative Decoding (Drafting) is now added: You can specify a second lightweight text model with the same vocab to perform speculative decoding, which can offer a speedup in some cases.
- The small model drafts tokens which the large model evaluates and accepts/rejects. Output should match the large model's quality.
- Not well supported on Vulkan, will likely be slower.
- Only works well for low temperatures, generally worse for creative writing.
- Added
/props
endpoint, which provides instruction/chat template data from the model (thanks @kallewoof) - Added chunked encoding support (thanks @mkarr)
- Added Version metadata info tags on Windows .exe binaries.
- Restored compatibility support for old Mixtral GGUF models. You should still update them.
- Bugfix for Grammar not being reset, Bugfix for Qwen2.5 missing some UTF-8 characters when streaming.
- GGUF format text encoders (clip/t5) are now supported for Flux and SD3.5
- Updated Kobold Lite, multiple fixes and improvements
- Multiplayer mode support added
- Added a new toggle switch to Adventure mode "Dice Action", which allow the AI to roll a dice to determine the outcome of an action.
- Allow disabling sentence trimming in all modes now.
- Removed some legacy unused features such as pseudostreaming.
- Merged fixes and improvements from upstream, including some nice Vulkan speedups and enhancements
Note: A bug has been found with image model loading. A patch is being developed to fix this.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag. You can also refer to the readme and the wiki.