koboldcpp-1.100
I-can't-believe-it's-not-version-2.0-edition
- NEW: WAN Video Generation has been added to KoboldCpp! - You can now generate short videos in KoboldCpp using the WAN model. Special thanks to @leejet for the sd.cpp implementation, and @wbruna for help merging and QoL fixes.
- Note: WAN requires a LOT of VRAM to run. If you run out of memory, try generating fewer frames and using a lower resolution. Especially on Vulkan, the VAE buffer size may be too large, use
--sdvaecpu
to run VAE on CPU instead. For comparison, 30 frames (2 seconds) of a 384x576 video will still require about 16GB VRAM even with VAE on CPU and CPU offloading enabled. You can also generate a single frame in which case it will behave like a normal image generation model. - Obtain the WAN2.2 14B rapid mega AIO model here. This is the most versatile option and can do both T2V and I2V. I do not recommend using the 1.3B WAN2.1 or the 5B WAN2.2, they both produce rather poor results. If you really don't care about quality, you can use small the 1.3B from here.
- Next, you will need the correct VAE and UMT5-XXL, note that some WAN models use different ones so if you're bringing your own do check it. Reference links are here.
- Load them all via the GUI launcher or by using
--sdvae
,--sdmodel
and--sdt5xxl
- Launch KoboldCpp and open SDUI at http://localhost:5001/sdui. I recommend starting with something small like 15 frames of a 384x384 video with 20 steps. Be prepared to wait a few minutes. The video will be rendered and saved to SDUI when done!
- It's recommended to use
--sdoffloadcpu
and--sdvaecpu
if you don't have enough VRAM. The VAE buffer can really be huge.
- Note: WAN requires a LOT of VRAM to run. If you run out of memory, try generating fewer frames and using a lower resolution. Especially on Vulkan, the VAE buffer size may be too large, use
- Added additional toggle flags for image generation:
--sdoffloadcpu
- Allows image generation weights to be dynamically loaded/unloaded to RAM when not in use, e.g. during VAE decoding.--sdvaecpu
- Performs VAE decoding on CPU using RAM instead.--sdclipcpu
- Performs CLIP/T5 decoding on GPU instead (new default is CPU)
- Updated StableUI to support animations/videos. If you want to perform I2V (Image-To-Video), you can do so in the txt2img panel.
- Renamed
--sdclipl
to--sdclip1
, and--sdclipg
to--sdclip2
. These flags are now used whenever there is a vision encoder to be used (e.g. WAN's clip_vision if applicable). - Disable TAESD if not applicable.
- Moved all
.embd
resource files into a separate directory for improved organization. Also extracted out image generation vocabs into their own files. - Moved
lowvram
CUDA option into a new flag--lowvram
(same as -nkvo), which can be used in both CUDA and Vulkan to avoid offloading the KV. Note: This is slow and not generally recommended. - Fixed Kimi template, added Granite 4 template.
- Enabled building for CUDA13 in the CMake, however it's untested and no binaries will be provided, also fixed Vulkan noext compiles.
- Fixed q4_0 repacking incoherence on CPU only, which started in v1.98.
- Fixed FastForwarding issues due to misidentified hybrid/rnn models, which should not happen anymore.
- Added
--sdgendefaults
to allow setting some default image generation parameters. - On admin config reload, reset nonexistent fields in config to default values instead of keeping the old value.
- Updated Kobold Lite, multiple fixes and improvements
- Set default filenames based on slot's name when downloading from saved slot.
- Added
dry_penalty_last_n
from @joybod which decouples dry range from rep pen range. - LaTeX rendering fixes, autoscroll fixes, various small tweaks
- Merged new model support including GLM4.6 and Granite 4, fixes and improvements from upstream
Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag. You can also refer to the readme and the wiki.