koboldcpp-1.103

NEW: Added support for Flux2 and Z-Image Turbo! Another big thanks to @leejet for the sd.cpp implementation and @wbruna for the assistance with testing and merging.
- To obtain models for Z-Image (Most recommended, lightweight):
  - Get the Image model here
  - Get the VAE here
  - Get the text encoder here (load this as Clip 1)
  - Alternative: Load this template to download all 3 automatically
- To obtain models for Flux2 (Not recommended as this model is huge so i will link the q2k. Remember to enable cpu offload. Running anything larger requires a very powerful GPU):
  - Get the Image model here
  - Get the VAE here
  - Get the text encoder here, load this as Clip 1
NEW: Mistral and Ministral 3 model support has been merged from upstream.
Improved "Assistant Continue" in llama.cpp UI mode, now can be used to continue partial turns.
- We have added prefill support to chat completions if you have /lcpp in your URL (/lcpp/v1/chat/completions), the regular chat completions is meant to mimick OpenAI and does not do this. Point your frontend to the URL that most fits your use case, we'd like feedback on which one of these you prefer and if the /lcpp behavior would break an existing use case.
Minor tool calling fix to avoid passing base64 media strings into the tool call.
Tweaked resizing behavior of the launcher UI.
Added a secondary terminal UI to view the console logging (only for Linux), can be used even when not launched from CLI. Launch this auxiliary terminal from the Extras tab.
AutoGuess Template fixes for GPT-OSS and Kimi
Fixed a bug with --showgui mode being saved into some configs
Updated Kobold Lite, multiple fixes and improvements
Merged fixes and improvements from upstream

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build for best support.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

LostRuins/koboldcpp v1.103 koboldcpp-1.103 on GitHub

koboldcpp-1.103

LostRuins/koboldcpp v1.103
koboldcpp-1.103

on GitHub