koboldcpp-1.114

kobo.mp4

NEW: Experimental parallel text generation requests (continuous batching) is now optionally supported - Normal text gen requests can be executed in parallel instead of queued in this experimental mode. To use, set --parallelrequests X where X is the number of simultaneous generations to allow at one time, or set parallel requests in network tab in the GUI. This can be useful when hosting for many users or on Horde. Note that this is a very experimental feature, support is limited. (Thanks @AlpinDale)
- Limitations: Only supports text gen requests with basic samplers, no multimodal, no special stateful samplers (e.g. antislop, grammar).
- Limitations: Cannot be used together with context shifting or fast forwarding.
NEW: RPC backend support is added - RPC allows multiple GPUs to be shared over the network, allowing for distributed generation with multiple computers! Note that it's not advised to expose RPC to the public internet! Set it up from the Network tab, or use --rpcmode and --rpctargets (if connecting). Default --rpcport is 5551. Thanks @Neresco for initial reference implementation.
- --rpcmode connect = access a remote GPU
- --rpcmode host = share your GPU
NEW: Support for LTX2.3 video generation added! As usual, credits to @leejet from stable-diffusion.cpp for the original implementation. It works, although it's still somewhat buggy with some setups.
- Quick Start Guide (16GB VRAM needed):
- First, download the latest KoboldCpp.
- Download and load this ltx2.3.kcppt template, open koboldcpp, load the config and select your preferred GPU backend (I used Vulkan). This will download 5 other models (ltx2.3-22b-distill, gemma3-12b, embedding-connector, audio-vae and video-vae), we will use the Distilled versions. Launch Kobold and select youe
- Once loaded, launch http://localhost:5001/sdui when done (if you're remote, you can enable remote tunnel for a cloudflare link)
- There are 2 options, txt2vid and img2vid. I'd recommend starting with a simple txt2vid, just setup a basic prompt, make sure steps are about 15 and guidance cfg is 1. Do not exceed 512x512 for now. 50 frames should work for a quick demo, that's about 4 seconds of video.
- In SDUI, enable AVI downloads in settings.
- Generate! It should take less than 1 min on a good GPU. When done, a GIF will be shown with the generated video.
- To get the AVI version (with audio), click on the video, and select [Download AVI]
Increased max frame limit, added FPS controls, added option to continue a generated video from final frame.
NEW: Support for many other new Image Generation models are also added! (Thanks @wbruna)
Also added support for setting individual devices for CLIP and VAE for image generation (Thanks @wbruna)
SDUI send as reference image toggle for img2img is now enabled by default.
Allow loading custom TAE (Tiny Auto Encoder) image VAEs in the VAE file select.
NEW: Multithreaded MP3 generation for AceStep - Multithreaded MP3s generate much faster, default also changed to 192kbps audio (thanks @askmyteapot)
QoL Fix: You can now easily toggle thinking with --jinjathink [default/true/false] or using the GUI dropdown. Internally it just sets the thinking_enabled jinja kwarg.
Jinja kwargs object can now be passed via API as chat_template_kwargs.
Breaking change - make GPU ID list in the GUI launcher 0-based index (0,1,2,3) instead of 1-based (1,2,3,4) index to match the CLI.
Breaking change - SWA is now enabled by default on all models that support it (was off by default previously). You can disable it with --noswa, or by unchecking the SWA checkbox in the GUI launcher. Models without SWA are unaffected.
Autofit now accounts properly for the exact memory taken by mmproj files
Revert AceStep VAE chunk size back to 256 (see #2224)
Added more options to GUI context size slider
Fixed a potential vulnerability when --onready was used with admin mode, .kcpps swapping can no longer trigger onready commands.
Updated Kobold Lite, multiple fixes and improvements
- Allow setting Sampler and Scheduler for ComfyUI remote endpoints, and A1111 compatible endpoints in Lite.
- Added more image gen resolution options
- Added aria labels for classic UI separators
- Fixed a bug that broke chat avatars in chat mode
- Reworked and unified the Import Character from Website URL option. Added support for BotBooru.
Merged fixes, new model support, and improvements from upstream

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools
Newer rolling experimental builds can be found here, these are auto-updated and may be unstable.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

LostRuins/koboldcpp v1.114 koboldcpp-1.114 on GitHub

koboldcpp-1.114

LostRuins/koboldcpp v1.114
koboldcpp-1.114

on GitHub