github LostRuins/koboldcpp v1.107
koboldcpp-1.107

11 hours ago

koboldcpp-1.107

is this loss edition

  • Added a new option for Vulkan (Older PC) in the oldpc builds. This provides GPU support via Vulkan without any CPU intrinsics (no AVX2, no AVX). This replaces the removed CLBlast options.
  • Breaking Changes:
    • Pipeline parallel is enabled by default now in CLI. Disable it in the launcher or with --nopipelineparallel
    • Flash attention is enabled by default now in CLI. Disable it in the launcher or with --noflashattention
  • Added a few fixes for GLM 4.7 Flash. Note that this model is extremely sensitive to rep-pen, recommend disabling rep pen when using it. Make sure you use a fixed gguf model as some early quants were broken. It may be helpful to use the GLM4.5 NoThink template, or enable forced thinking if you desire it.
  • Fixes for mcp.json importing and MCP tool listing handshake (thanks @Rose22)
  • Changed MCP user agent string as some sites were blocking it.
  • Added the fractional scaling workaround fix for the GUI launcher for KDE on Wayland.
  • Added support for SDXS, a really fast Stable Diffusion Image Generation model. This model is so fast that it can generate images on pure CPU in under 10 seconds on a raspberry Pi. Running it on GPU allows generating images in under half a second. An excellent way to get image generation if you do not have a GPU. For convenience, a GGUF quant of SDXS is provided here.
  • Added support for ESRGAN 4x upscaler. Load this as an upscaler model to be able to upscale your generated images.
  • Merged Image Gen improvements and Flux Klein model support from upstream (thanks @wbruna). Get Flux Klein's image model, VAE and text encoder.
  • Added TAE SD support for Flux2, enable with --sdvaeauto.
  • Increase image generation hard total resolution limit from 1 megapixel to 1.6 megapixels.
  • Updated SDUI with some quality of life fixes by @Riztard
  • Updated Kobold Lite, multiple fixes and improvements
    • Added even more themes from @Rose22
    • Added experimental TTS chunked streaming mode (works for all TTS APIs)
    • Added customizable sampler presets from @lubumbax
    • Removed manual admin state caching panel since it's made obsolete by --smartcache. The API still exists but should be unnecessary.
  • Merged fixes, model support, and improvements from upstream, including Vulkan speedup from occam's coopmat1 optimization. Coopmat1 is used by GPU's with matrixcores such as the 7000 and 9000 series AMD GPU's.

Important Notice: The CLBlast backend is fully deprecated and has been REMOVED as of this version. If you require CLBlast, you will need to use an earlier version.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Don't miss a new koboldcpp release

NewReleases is sending notifications on new releases.