koboldcpp-1.37.1
- NEW: KoboldCpp now comes with an embedded Horde Worker which allows anyone to share their ggml models with the AI Horde without downloading additional dependences.
--hordeconfig
now accepts 5 parameters[hordemodelname] [hordegenlength] [hordemaxctx] [hordeapikey] [hordeworkername]
, filling up all 5 will start a Horde worker for you that serves horde requests automatically in the background. For previous behavior, exclude the last 2 parameters to continue using your own Horde worker (e.g. HaidraScribe/KAIHordeBridge). This feature can also be enabled via the GUI. - Added Support for LLAMA2 70B models. This should work automatically, GQA will be set to 8 if it's detected.
- Fixed a bug with mirostat v2 that was causing overly deterministic results. Please try it again. (Credit: @ycros)
- Added addition information to
/api/extra/perf
for the last generation info, including the stopping reason as well as generated token counts. - Exposed the parameter for
--tensor_split
which works exactly like it does upstream. Only for CUDA. - Try to support Kepler as a target for CUDA as well on henky's suggestion, can't guarantee it will work as I don't have a K80, but it might.
- Retained support for
--blasbatchsize 1024
after it was removed upstream. Scratch & KV buffer sizes will be larger when using this. - Minor bugfixes, pulled other upstream fixes and optimizations, updated Kobold Lite (chat mode improvements)
Hotfix 1.37.1
- Fixed clblast to work correctly for LLAMA2 70B
- Fixed sending Client-Agent for embedded horde worker in addition to Bridge Agent and User Agent
- Changed
rms_norm_eps
to5e-6
for better results for both llama1 and 2 - Fixed some streaming bugs in Lite
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.