llamacpp-for-kobold-1.0.6-beta
- This is an experimental release containing new integrations for OpenBLAS, which should increase initial prompt processing speed on compatible systems by over 2 times!
- Updated Embedded Kobold Lite with the latest version which supports pseudo token streaming. This should make the UI feel much more responsive during prompt generation.
- Switched to argparse, you can view all command line flags with
llamacpp-for-kobold.exe --help
- To disable OpenBLAS, you can run it with
--noblas
. Please tell me if you have issues with it, and include which specific OS and platform.
To use, download and run the llamacpp-for-kobold.exe
Alternatively, drag and drop a compatible quantized model for llamacpp on top of the .exe, or run it and manually select the model in the popup dialog.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001