KoboldCpp-1.57.1.yr1-ROCm
Windows build does not contain the Vulkan backend yet.
- Experimental ROCm Support for Windows was added for the following GPUs thanks to @harish0201 and @jasyuiop:
Desktop GPUs | Laptop GPUs |
---|---|
AMD Radeon PRO W6600 | AMD Radeon PRO W6600M |
AMD Radeon PRO W6600X | AMD Radeon PRO W6600X |
AMD Radeon RX 6600 | AMD Radeon RX 6600S |
AMD Radeon RX 6600 XT | AMD Radeon RX 6700S |
AMD Radeon RX 6650 XT | AMD Radeon RX 6800S |
AMD Radeon RX 6700 | AMD Radeon RX 6650M |
AMD Radeon RX 6700 XT | AMD Radeon RX 6650M XT |
AMD Radeon RX 6750 XT | AMD Radeon RX 6700M |
AMD Radeon RX 6750 GRE 10 GB | AMD Radeon RX 6800M |
AMD Radeon RX 6750 GRE 12 GB | AMD Radeon RX 6850M XT |
Upstream Changelog:
- Added a benchmarking feature with --benchmark, which automatically runs a benchmark with your provided settings, outputting run parameters, timing and speed information as well as testing for coherence, and exiting on completion. You can provide a filename e.g. --benchmark result.csv and it will write CSV formatted data appended to that file.
- Added temperature Quad-Sampling (set via API with parameter smoothing_factor) PR from @AAbushady, (credits @kalomaze).
- Improved timing displays. Also, displays the seed used, and also shows llama.cpp styled timings when run in --debugmode. The timings will appear faster as they do not include overheads, measuring only specific eval functions.
- Improved abort generation behavior (allows second user aborting while in queue)
- Vulkan enhancements from @0cc4m merged: APU memory handling and multigpu. To use multigpu, you can now specify additional IDs, for example --usevulkan 0 2 3 which will use GPUs with IDs 0,2, and 3. Allocation is determined by --tensor_split. Multigpu for Vulkan is currently configurable via commandline only, the GUI launcher does not allow selecting multiple devices for Vulkan.
- Various improvements and bugfixes merged from upstream.
- Updated Kobold Lite with many improvements and new features:
- NEW: The Aesthetic UI is now available for Story and Adventure modes as well!
- Added "AI Impersonate" feature for Instruct mode.
- Smoothing factor added, can be configured in dynamic temperature panel.
- Added a toggle to enable printable view (unlock vertical scrolling).
- Added a toggle to inject timestamps, allowing the AI to be aware of time passing.
- Persist API info for A1111 and XTTS, allows specifying custom negative prompts for image gen, allows specifying custom horde keys in KCPP mode.
- Fixes for XTTS to handle devices with over 100 voices, and also adds an option to narrate dialogue only.
- Toggle to request A1111 backend to save generated images to disk.
- Fix for chub.ai card fetching.
- Hotfix1.57.1: Fixed some crashes and fixed multigpu for vulkan.
To use on Windows, download and run the koboldcpp_rocm.exe, which is a one-file pyinstaller OR download koboldcpp_rocm_files.zip and run python koboldcpp.py
(additional python pip modules might need installed, like customtkinter and tk or python-tk.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4
(-j4 can be adjusted to your number of CPU threads for faster build times)
For a full Linux build, make sure you have the OpenBLAS and CLBlast packages installed:
For Arch Linux: Install cblas
openblas
and clblast
.
For Debian: Install libclblast-dev
and libopenblas-dev
.
then run make LLAMA_HIPBLAS=1 LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4
If you're using NVIDIA, you can try koboldcpp.exe at LostRuin's upstream repo here
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller, also at LostRuin's repo.
To use on Linux, clone the repo and build with make LLAMA_HIPBLAS=1 -j4
Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help flag.