llamacpp-for-kobold-1.0.7
- Added support for new version of the ggml llamacpp model format (magic=ggjt, version 3). All old versions will continue to be supported.
- Integrated speed improvements from parent repo.
- Fixed an encoding issue with utf-8 in the outputs.
- Improved console debug information during generation, now shows token progress and time taken directly.
- Set non-streaming to be the default mode. You can enable streaming with
--stream
To use, download and run the llamacpp-for-kobold.exe
Alternatively, drag and drop a compatible quantized model for llamacpp on top of the .exe, or run it and manually select the model in the popup dialog.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001