koboldcpp-1.100

I-can't-believe-it's-not-version-2.0-edition

NEW: WAN Video Generation has been added to KoboldCpp! - You can now generate short videos in KoboldCpp using the WAN model. Special thanks to @leejet for the sd.cpp implementation, and @wbruna for help merging and QoL fixes.
- Note: WAN requires a LOT of VRAM to run. If you run out of memory, try generating fewer frames and using a lower resolution. Especially on Vulkan, the VAE buffer size may be too large, use --sdvaecpu to run VAE on CPU instead. For comparison, 30 frames (2 seconds) of a 384x576 video will still require about 16GB VRAM even with VAE on CPU and CPU offloading enabled. You can also generate a single frame in which case it will behave like a normal image generation model.
- Obtain the WAN2.2 14B rapid mega AIO model here. This is the most versatile option and can do both T2V and I2V. I do not recommend using the 1.3B WAN2.1 or the 5B WAN2.2, they both produce rather poor results. If you really don't care about quality, you can use small the 1.3B from here.
- Next, you will need the correct VAE and UMT5-XXL, note that some WAN models use different ones so if you're bringing your own do check it. Reference links are here.
- Load them all via the GUI launcher or by using --sdvae, --sdmodel and --sdt5xxl
- Launch KoboldCpp and open SDUI at http://localhost:5001/sdui. I recommend starting with something small like 15 frames of a 384x384 video with 20 steps. Be prepared to wait a few minutes. The video will be rendered and saved to SDUI when done!
- It's recommended to use --sdoffloadcpu and --sdvaecpu if you don't have enough VRAM. The VAE buffer can really be huge.
Added additional toggle flags for image generation:
- --sdoffloadcpu - Allows image generation weights to be dynamically loaded/unloaded to RAM when not in use, e.g. during VAE decoding.
- --sdvaecpu - Performs VAE decoding on CPU using RAM instead.
- --sdclipcpu - Performs CLIP/T5 decoding on GPU instead (new default is CPU)
Updated StableUI to support animations/videos. If you want to perform I2V (Image-To-Video), you can do so in the txt2img panel.
Renamed --sdclipl to --sdclip1, and --sdclipg to --sdclip2. These flags are now used whenever there is a vision encoder to be used (e.g. WAN's clip_vision if applicable).
Disable TAESD if not applicable.
Moved all .embd resource files into a separate directory for improved organization. Also extracted out image generation vocabs into their own files.
Moved lowvram CUDA option into a new flag --lowvram (same as -nkvo), which can be used in both CUDA and Vulkan to avoid offloading the KV. Note: This is slow and not generally recommended.
Fixed Kimi template, added Granite 4 template.
Enabled building for CUDA13 in the CMake, however it's untested and no binaries will be provided, also fixed Vulkan noext compiles.
Fixed q4_0 repacking incoherence on CPU only, which started in v1.98.
Fixed FastForwarding issues due to misidentified hybrid/rnn models, which should not happen anymore.
Added --sdgendefaults to allow setting some default image generation parameters.
On admin config reload, reset nonexistent fields in config to default values instead of keeping the old value.
Updated Kobold Lite, multiple fixes and improvements
- Set default filenames based on slot's name when downloading from saved slot.
- Added dry_penalty_last_n from @joybod which decouples dry range from rep pen range.
- LaTeX rendering fixes, autoscroll fixes, various small tweaks
Merged new model support including GLM4.6 and Granite 4, fixes and improvements from upstream

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

LostRuins/koboldcpp v1.100 koboldcpp-1.100 on GitHub

koboldcpp-1.100

LostRuins/koboldcpp v1.100
koboldcpp-1.100

on GitHub