github LostRuins/koboldcpp v1.64.1
koboldcpp-1.64.1

latest releases: v1.77, v1.76, v1.75.2...
6 months ago

koboldcpp-1.64.1

  • Added fixes for Llama 3 tokenization: Support updated Llama 3 GGUFs with pre-tokenizations.
    • Note: In order to benefit from the tokenizer fix, the GGUF models need to be reconverted after this commit. A warning will be displayed if the model was created before this fix.
  • Automatically support and apply both EOS and EOT tokens. EOT tokens are also correctly biased when EOS is banned.
  • finish_reason is now correctly communicated in both sync and SSE streamed modes responses when token generation is stopped by EOS/EOT. Also, Kobold Lite no longer trims sentences if a EOS/EOT is detected as the stop reason in instruct mode.
  • Added proper support for trim_stop in SSE streaming modes. Stop sequences will no longer be exposed even during streaming when trim_stop is enabled. Additionally, using the Chat Completions endpoint automatically applies trim stop to the instruct tag format used. This allows better out-of-box compatibility with third party clients like LibreChat.
  • --bantokens flag has been removed. Instead, you can now submit banned_tokens dynamically via the generate API, for each specific generation, and all matching tokens will be banned for that generation.
  • Added render_special to the generate API, which allows you to enable rendering of special tokens like <|start_header_id|> or <|eot_id|> if enabled.
  • Added new experimental flag --flashattention to enable Flash Attention for compatible models.
  • Added support for resizing the GUI launcher, all GUI elements will auto-scale to fit. This can be useful for high DPI screens.
  • Improved speed of rep pen sampler.
  • Added additional debug information in --debugmode.
  • Added a button for starting the benchmark feature in GUI launcher mode.
  • Fixed slow clip processing speed issue on Colab
  • Fixed quantization tool compilation again
  • Updated Kobold Lite:
    • Improved stop sequence and EOS handling
    • Fixed instruct tag dropdown
    • Added token filter feature
    • Added enhanced regex replacement (now also allowed for submitted text)
    • Support custom {{placeholder}} tags.
    • Better max context handling when used in Kcpp
    • Support for Inverted world info secondary keys (triggers when NOT present)
    • Language customization for XTTS

Hotfix 1.64.1: Fixed LLAVA being incoherent on the second generation onwards. Also, the gui launcher has been tidied up, lowvram is now removed from quick launch tab and only in hardware tab. --benchmark includes version and gives clearer exit instructions in console output now. Fixed some tkinter error outputs on quit.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

Don't miss a new koboldcpp release

NewReleases is sending notifications on new releases.