koboldcpp-1.40.1
This release is mostly for bugfixes to the previous one, but enough small stuff has changed that I chose to make it a new version instead of a patch for the previous one.
- Fixed a regression in format detection for LLAMA 70B.
- Converted the embedded horde worker into daemon mode, hopefully solves the occasional exceptions
- Fixed some OOMs for blasbatchsize 2048, adjusted buffer sizes
- Slight modification to the look ahead (2 to 5%) for the cuda pool malloc.
- Pulled some bugfixes from upstream
- Added a new field
idle
for the/api/extra/perf
endpoint, allows checking if a generation is in progress without sending one. - Fixed cmake compilation for cudatoolkit 12.
- Updated Lite, includes option for aesthetic instruct UI (early beta by @Lyrcaxis, please send them your feedback)
hotfix 1.40.1:
- handle stablecode-completion-alpha-3b
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.