koboldcpp-1.37.1

NEW: KoboldCpp now comes with an embedded Horde Worker which allows anyone to share their ggml models with the AI Horde without downloading additional dependences. --hordeconfig now accepts 5 parameters [hordemodelname] [hordegenlength] [hordemaxctx] [hordeapikey] [hordeworkername], filling up all 5 will start a Horde worker for you that serves horde requests automatically in the background. For previous behavior, exclude the last 2 parameters to continue using your own Horde worker (e.g. HaidraScribe/KAIHordeBridge). This feature can also be enabled via the GUI.
Added Support for LLAMA2 70B models. This should work automatically, GQA will be set to 8 if it's detected.
Fixed a bug with mirostat v2 that was causing overly deterministic results. Please try it again. (Credit: @ycros)
Added addition information to /api/extra/perf for the last generation info, including the stopping reason as well as generated token counts.
Exposed the parameter for --tensor_split which works exactly like it does upstream. Only for CUDA.
Try to support Kepler as a target for CUDA as well on henky's suggestion, can't guarantee it will work as I don't have a K80, but it might.
Retained support for --blasbatchsize 1024 after it was removed upstream. Scratch & KV buffer sizes will be larger when using this.
Minor bugfixes, pulled other upstream fixes and optimizations, updated Kobold Lite (chat mode improvements)

Hotfix 1.37.1

Fixed clblast to work correctly for LLAMA2 70B
Fixed sending Client-Agent for embedded horde worker in addition to Bridge Agent and User Agent
Changed rms_norm_eps to 5e-6 for better results for both llama1 and 2
Fixed some streaming bugs in Lite

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.37.1a koboldcpp-1.37.1 on GitHub

koboldcpp-1.37.1

LostRuins/koboldcpp v1.37.1a
koboldcpp-1.37.1

on GitHub