koboldcpp-1.109

SmartCache improvements - SmartCache should work better for RNN/hybrid models like Qwen 3.5 now. Additionally, smartcache is automatically enabled when using such models for a smoother experience, unless fast forwarding is disabled.
NEW: Added experimental support for Music Generation via Ace Step - KoboldCpp now optionally supports generating music natively in as little as 4GB of VRAM, thanks to @ServeurpersoCom's acestep.cpp.
- Requires 4 files (AceStep LM, diffusion, embedder and VAE which are found https://huggingface.co/koboldcpp/music/tree/main), but for your convenience we made templates for 6GB of vram (recommended option) 1.7B LM. (you can also try alternative templates for 4GB of vram here, and 4B 6GB (Both are a tight fit, not recommended as a first option), 4B 8GB of vram and 4B 10GB of vram
- When used, a brand new UI has been added at http://localhost:5001/musicui
- New CLI args added --musicllm --musicdiffusion --musicembeddings --musicvae and --musiclowvram
- To keep KoboldCpp lightweight our implementation re-uses the existing GGML libraries from Llamacpp, we are currently waiting on ace-step.cpp to upstream its GGML improvements.
- As usual the ace-step specific backend components are only loaded if you are trying to load a music generation model, if you only wish to use KoboldCpp for text generation this addition does not impact your performance or memory usage.
NEW: Added Qwen3-TTS support with high quality voice cloning - Finally, support for qwen3tts has been added from @predict-woo's qwen3-tts.cpp, this allows for high quality voice cloning at the level of XTTS, and much better than the OuteTTS one.
- You'll need the TTS model and the qwen3tts tokenizer, remember to also specify a TTS directory if you want to use voice cloning.
- Specify a directory of short voice audio samples (.mp3 or .wav) with --ttsdir, you'll be able to use TTS narration with those voices.
- For the fastest generation speed use Vulkan.
Fix follow-up tool call check with assistant prefills
Fixed image importing in SDUI
Config packing improvements as a minor sd.cpp update from @wbruna
Fixed a wav header packing issue that could cause a click in output audio
Relaxes size restrictions in image gen, also support high res reference images.
--admindir now also indexes subdirectories up to 1 level deep.
Show timestamps when image gen is completed
Updated Kobold Lite, multiple fixes and improvements
Merged fixes, new model support, and improvements from upstream

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

LostRuins/koboldcpp v1.109 koboldcpp-1.109 on GitHub

koboldcpp-1.109

LostRuins/koboldcpp v1.109
koboldcpp-1.109

on GitHub