koboldcpp-1.40.1

This release is mostly for bugfixes to the previous one, but enough small stuff has changed that I chose to make it a new version instead of a patch for the previous one.

Fixed a regression in format detection for LLAMA 70B.
Converted the embedded horde worker into daemon mode, hopefully solves the occasional exceptions
Fixed some OOMs for blasbatchsize 2048, adjusted buffer sizes
Slight modification to the look ahead (2 to 5%) for the cuda pool malloc.
Pulled some bugfixes from upstream
Added a new field idle for the /api/extra/perf endpoint, allows checking if a generation is in progress without sending one.
Fixed cmake compilation for cudatoolkit 12.
Updated Lite, includes option for aesthetic instruct UI (early beta by @Lyrcaxis, please send them your feedback)

hotfix 1.40.1:

handle stablecode-completion-alpha-3b

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.40.1a koboldcpp-1.40.1 on GitHub

koboldcpp-1.40.1

LostRuins/koboldcpp v1.40.1a
koboldcpp-1.40.1

on GitHub