koboldcpp-1.49

New API feature: Split Memory - The generation payload also supports a new field memory in addition to the usual prompt field. If set, forcefully appends this string to the beginning of any submitted prompt text. If resulting context exceeds the limit, forcefully overwrites text from the beginning of the main prompt until it can fit. Useful to guarantee full memory insertion even when you cannot determine exact token count. Automatically used in Lite.
New API feature: trim_stop can be added to the generate payload. If true, removes detected stop_sequences from the output and truncates all text after them. Does not work with SSE streaming.
New API feature: --preloadstory now allows you to specify a json file (such as a story savefile) when launching the server. This file will be hosted on the server at /api/extra/preloadstory, which frontends (such as Kobold Lite) can access over the API.
Pulled various improvements and fixes from upstream llama.cpp
Updated Kobold Lite, added new TTS options and fixed some bugs with the Retry button when Aborting. Added support for World Info inject position, split memory and preloaded stories. Also added support for optional image generation using DALL-E 3 (OAI API).
Fixed KoboldCpp colab prebuilts crashing on some older Colab CPUs. It should now also work on A100 and V100 GPUs in addition to the free tier T4s. If it fails, try enabling the ForceRebuild checkbox. LLAMA_PORTABLE=1 makefile flag can now be used when making builds that target colab or Docker.
Various other minor fixes.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.49 koboldcpp-1.49 on GitHub

koboldcpp-1.49

LostRuins/koboldcpp v1.49
koboldcpp-1.49

on GitHub