Ok, so there are 4 different EXE builds here, the one named "koboldcpp_rocm.exe" has been built for RX6000 and RX7000 series GPUs
The other 3 have been built for the following GPU targets: "gfx803;gfx900;gfx906;gfx1010;gfx1011;gfx1012;gfx1030;gfx1031;gfx1032;gfx1100;gfx1101;gfx1102"
The 3 of them have been built in slightly different ways as I do not yet know which offers best performance yet, but after some testing, if everything works out okay and it improves koboldcpp-rocm, I'll move it back to 1 or 2 exe files.
koboldcpp_rocm.exe
: has been built using the AMD ROCm 5.7.1 provided "Tensile Libraries"/GPU code.
koboldcpp_rocm4allV1.exe
: has been built using ROCm-4-All-5.7.1 Tensile Libraries and then added to the AMD ROCm folder with the other provided GPU code before compiling.
koboldcpp_rocm4allV2.exe
: has been built by using the AMD ROCm 5.7.1 provided "Tensile Libraries"/GPU code for compiling but then adding only the ROCm-4-All-5.7.1 Tensile Libraries while generating the .exe.
koboldcpp_rocm4allV3.exe
: has been built by deleting the entire stock AMD ROCm 5.7.1 GPU code folder and replacing it with only ROCm-4-All-5.7.1 Tensile Library files before compiling.
My gut says koboldcpp_rocm4allV3.exe
will probably perform best of the 3 versions. If you have a RX6000 or RX7000 series gpu, I would compare koboldcpp_rocm.exe
and koboldcpp_rocm4allV3.exe
, there might be a noticeable speed difference.
koboldcpp_rocm4allV1.exe and koboldcpp_rocm4allV2.exe may change generation and processing performance, but I would stick with the original and V3 files as the first ones to try.
Sorry for the whole mess of different .EXEs but hopefully it brings improvement to KoboldCpp-ROCm for Windows!
ROCm-4-All-5.7.1 Tensile Libraries were obtained from https://github.com/brknsoul/ROCmLibs
The full Changelog for this version can be read at https://github.com/LostRuins/koboldcpp/releases/tag/v1.67
The biggest changes being the integration of Whisper.cpp into KoboldCpp and Quantized KV Cache