koboldcpp-1.38

Added upstream support for Quantized MatMul (MMQ) prompt processing, a new option for CUDA (enabled by adding --usecublas mmq or toggle in GUI). This uses slightly less memory, and is slightly faster for Q4_0 but slower for K-quants.
Fixed SSE streaming for multibyte characters (For Tavern compatibility)
--noavx2 mode now does not use OpenBLAS (same as Failsafe), this is due to numerous compatibility complaints.
GUI dropdown preset only displays built platforms (Credit: @YellowRoseCx)
Added a Help button in the GUI
Fixed an issue with mirostat not reading correct value from GUI
Fixed an issue with context size slider being limited to 4096 in the GUI
Displays a terminal warning if received context exceeds max launcher allocated context

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.38 koboldcpp-1.38 on GitHub

koboldcpp-1.38

LostRuins/koboldcpp v1.38
koboldcpp-1.38

on GitHub