koboldcpp-1.77
the road not taken edition
- NEW: Token Probabilities (logprobs) are now available over the API! Currently only supplied over the sync API (non-streaming), but a second
/api/extra/last_logprobs
dedicated logprobs endpoint is also provided. Will work and provide a link to view alternate token probabilities for both streaming and non-streaming if "logprobs" is enabled in KoboldAI Lite settings. Will also work in SillyTavern when streaming is disabled, once the latest build is out. - Response
prompt_tokens
,completion_tokens
andtotal_tokens
are now accurate values instead of placeholders. - Enabled CUDA graphs for the cuda12 build, which can improve performance on some cards.
- Fixed a bug where .wav audio files uploaded directly to the
/v1/audio/transcriptions
endpoint get fragmented and cut off early. Audio sent as base64 within JSON payloads are unaffected. - Fixed a bug where Whisper transcription blocked generation in non-multiuser mode.
- Fixed a bug where
trim_stop
did not remove a stop sequence that was divided across multiple tokens in some cases. - Significantly increased the maximum limits for stop sequences, anti-slop token bans, logit biases and DRY sequence breakers, (thanks to @mayaeary for the PR which changes the way some parameters are passed to the CPP side)
- Added link to help page if user fails to select a model.
- Flash Attention GUI quick launcher toggle hidden by default if Vulkan is selected (usually reduced performance).
- Updated Kobold Lite, multiple fixes and improvements
- NEW: Experimental ComfyUI Support Added!: ComfyUI can now be used as an image generation backend API from within KoboldAI Lite. No workflow customization is necessary. Note: ComfyUI must be launched with the flags --listen --enable-cors-header '*' to enable API access. Then you may use it normally like any other Image Gen backend.
- Clarified the option for selecting A1111/Forge/KoboldCpp as an image gen backend, since Forge is gradually superseding A1111. This option is compatible with all 3 of the above.
- You are now able to generate images from instruct mode via natural language, similar to chatgpt. (e.g. Please generate an image of a bag of sand). This option requires having an image model loaded, it uses regex and is enabled by default, it can be disabled in settings.
- Added support for Tavern "V3" character cards: Actually, V3 is not a real format, it's an augmented V2 card used by Risu that adds additional metadata chunks. These chunks are not supported in Lite, but the base "V2" card functionality will work.
- Added new scenario "Interactive Storywriter": This is similar to story writing mode, but allows you to secretly steer the story with hidden instruction prompts.
- Added Token Probability Viewer - You can now see a table of alternative token probabilities in responses. Disabled by default, enable in advanced settings.
- Fixed JSON file selection problems in some mobile browsers.
- Fixed Aetherroom importer.
- Minor Corpo UI layout tweaks by @Ace-Lite
- Merged fixes and improvements from upstream
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.