koboldcpp-1.47.2
- Added OpenAI optional adapter from #466 (thanks @lofcz) . This is an unofficial extension of the v1 OpenAI Chat Completions endpoint that allows customization of the instruct tags over the API. The Kobold API still provides better functionality and flexibility overall.
- Pulled upstream support for ChatML added token merges (they have to be from a correctly converted GGUF model though, overall ChatML is still an inferior prompt template compared to Alpaca/Vicuna/LLAMA2).
- Embedded Horde Worker improvements: Added auto-recovery pause timeout on too many errors, instead of halting the worker outright. The worker will still be halted if the total error count exceeds a high enough threshold.
- Bug fixes for a multiuser race condition in polled streaming and for Top-K values being clamped (thanks @raefu @kalomaze)
- Improved server CORS and content-type handling.
- Added GUI input for tensor_split fields (thanks @AAbushady)
- Fixed support for GGUFv1 Falcon models, which was broken due to the upstream rewrite of the BPE tokenizer.
- Pulled other fixes and optimizations from upstream
- Updated KoboldCpp Colab, now with the new Tiefighter model (try it here)
Hotfix 1.47.1 - Fixed a race condition with SSE streaming. Tavern streaming should be reliable now.
Hotfix 1.47.2 - Fixed an issue with older multilingual GGUFs needing an alternate BPE tokenizer.
Updates for Embedded Kobold Lite:
- SSE streaming for Kobold Lite has been implemented! It requires a relatively recent browser. Toggle it on in settings.
- Added Browser Storage Save Slots! You can now directly save stories within the browser session itself. This is intended to be a temporary storage allowing you to swap between and try multiple stories - the browser storage is wiped when the browser cache/history is cleared!
- Added World Info Search Depth
- Added Group Chat Management Panel (You can temporarily toggle the participants in a group chat)
- Added AUTOMATIC1111 integration! It's finally here, you can now generate images from a local A1111 install, as an alternative to Horde,
- Lots of miscellaneous fixes and improvements. If you encounter any issues, do report them here.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.