koboldcpp-1.64.1
- Added fixes for Llama 3 tokenization: Support updated Llama 3 GGUFs with pre-tokenizations.
- Note: In order to benefit from the tokenizer fix, the GGUF models need to be reconverted after this commit. A warning will be displayed if the model was created before this fix.
- Automatically support and apply both EOS and EOT tokens. EOT tokens are also correctly biased when EOS is banned.
finish_reason
is now correctly communicated in both sync and SSE streamed modes responses when token generation is stopped by EOS/EOT. Also, Kobold Lite no longer trims sentences if a EOS/EOT is detected as the stop reason in instruct mode.- Added proper support for
trim_stop
in SSE streaming modes. Stop sequences will no longer be exposed even during streaming whentrim_stop
is enabled. Additionally, using the Chat Completions endpoint automatically applies trim stop to the instruct tag format used. This allows better out-of-box compatibility with third party clients like LibreChat. --bantokens
flag has been removed. Instead, you can now submitbanned_tokens
dynamically via the generate API, for each specific generation, and all matching tokens will be banned for that generation.- Added
render_special
to the generate API, which allows you to enable rendering of special tokens like<|start_header_id|>
or<|eot_id|>
if enabled. - Added new experimental flag
--flashattention
to enable Flash Attention for compatible models. - Added support for resizing the GUI launcher, all GUI elements will auto-scale to fit. This can be useful for high DPI screens.
- Improved speed of rep pen sampler.
- Added additional debug information in
--debugmode
. - Added a button for starting the benchmark feature in GUI launcher mode.
- Fixed slow clip processing speed issue on Colab
- Fixed quantization tool compilation again
- Updated Kobold Lite:
- Improved stop sequence and EOS handling
- Fixed instruct tag dropdown
- Added token filter feature
- Added enhanced regex replacement (now also allowed for submitted text)
- Support custom
{{placeholder}}
tags. - Better max context handling when used in Kcpp
- Support for Inverted world info secondary keys (triggers when NOT present)
- Language customization for XTTS
Hotfix 1.64.1: Fixed LLAVA being incoherent on the second generation onwards. Also, the gui launcher has been tidied up, lowvram is now removed from quick launch tab and only in hardware tab. --benchmark
includes version and gives clearer exit instructions in console output now. Fixed some tkinter error outputs on quit.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.