KoboldCpp-1.58.yr0-ROCm
Upstream Changelog:
- Added a toggle for row split mode with cuda multigpu. Split mode changed to layer split by default. If using command line, add rowsplit to --usecublas to enable row split mode. With the GUI launcher, it's a checkbox toggle.
- Multiple bugfixes: fixed benchmark command, fixed SSL streaming issues, fixed some SSE formatting with OAI endpoints.
- Make context shifting more forgiving when determining eligibility.
- Upgraded CLBlast to latest version, should result in a modest prompt processing speedup when using CL.
- Various improvements and bugfixes merged from upstream.
- Updated Kobold Lite with many improvements and new features:
- New: Integrated 'AI Vision' for images, this uses AI Horde or a local A1111 endpoint to perform image interrogation, allowing the AI to recognize and interpret uploaded or generated images. This should provide an option for multimodality similar to llava, although not as precise. Click on any image and you can enable it within Lite. This functionality is not provided by KCPP itself.
- New: Importing characters from Pygmalion.Chat is now supported in Lite, select it from scenarios.
- Added option to run Lite in background. It plays a dynamically generated silent audio sound. This should prevent the browser tab from hibernating.
- Fixed printable view, persist streaming text on error, fixed instruct timestamps
- Added "Auto" option for idle responses.
- Allow importing images into story from local disk
- Multiple minor formatting and bug fixes.
To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py
(additional python pip modules might need installed, like customtkinter and tk or python-tk.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4
(-j4 can be adjusted to your number of CPU threads for faster build times)
For a full Linux build, make sure you have the OpenBLAS and CLBlast packages installed:
For Arch Linux: Install cblas
openblas
and clblast
.
For Debian: Install libclblast-dev
and libopenblas-dev
.
then run make LLAMA_HIPBLAS=1 LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4
If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag.