koboldcpp-1.73

NEW: Added dual-stack (IPv6) network support. KoboldCpp now properly runs on IPv6 networks, the same instance can serve both IPv4 and IPv6 addresses automatically on the same port. This should also fix problems with resolving localhost on some systems. Please report any issues you face.
NEW: Added official MacOS pyinstaller binary builds! Modern MacOS (M1, M2, M3) users can now use KoboldCpp without having to self-compile, simply download and run koboldcpp-mac-arm64. Special thanks to @henk717 for setting this up.
NEW: Pure CLI Mode - Added --prompt, allowing KoboldCpp to be used entirely from command-line alone. When running with --prompt, all other console outputs are suppressed, except for that prompt's response which is piped directly to stdout. You can control the output length with --promptlimit. These 2 flags can also be combined with --benchmark, allowing benchmarking with a custom prompt and returning the response. Note that this mode is only intended for quick testing and simple usage, no sampler settings will be configurable.
Changed the default benchmark prompt to prevent stack overflow on old bpe tokenizer.
Pre-filter to the top 5000 token candidates before sampling, this greatly improves sampling speed on models with massive vocab sizes with negligible response changes.
Moved chat completions adapter selection to Model Files tab.
Improve GPU layer estimation by accounting for in-use VRAM.
--multiuser now defaults to true. Set --multiuser 0 to disable it.
Updated Kobold Lite, multiple fixes and improvements
Merged fixes and improvements from upstream, including Minitron and MiniCPM features (note: there are some broken minitron models floating around - if stuck, try this one first!)

NOTICE: DRY is completely broken in this version. It will also cause issues in ST. A fix is currently being developed - for now do not use DRY or switch to 1.72

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.73 koboldcpp-1.73 on GitHub

koboldcpp-1.73

LostRuins/koboldcpp v1.73
koboldcpp-1.73

on GitHub