koboldcpp-1.39.1

Fix SSE streaming to handle headers correctly during abort (Credits: @duncannah)
Bugfix for --blasbatchsize -1 and 1024 (fix alloc blocks error)
Added experimental support for --blasbatchsize 2048 (note, buffers are doubled if that is selected, using much more memory)
Added support for 12k and 16k --contextsize options. Please let me know if you encounter issues.
Pulled upstream improvements, further CUDA speedups for MMQ mode for all quant types.
Fix for some LLAMA 65B models being detected as LLAMA2 70B models.
Revert to upstream approach for CUDA pool malloc (1.39.1 - done only for MMQ).
Updated Lite, includes adding support for importing Tavern V2 card formats, with world info (character book) and clearer settings edit boxes.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.39.1 koboldcpp-1.39.1 on GitHub

koboldcpp-1.39.1

LostRuins/koboldcpp v1.39.1
koboldcpp-1.39.1

on GitHub