koboldcpp-1.68
- Added GradientAI Automatic RoPE calculation, thanks to @askmyteapot , this should provide a better automatic RoPE scaling values for large context sizes.
- CLBlast support has been preserved, although it is now removed upstream. For now, I still intend to retain it as long as feasible.
- Multi GPU is now made easy in Vulkan, with an
All
GPU option in the GUI launcher added similar to cuda. Also, vulkan now defaults to the first dedicated GPU if--usevulkan
is run without any other parameters, instead of just the first GPU on the list (thanks @0cc4m ) - The tokenize endpoint at
/api/extra/tokencount
now has an option to skip BOS tokens, by settingspecial
to false. - Running a KCPP horde worker now automatically sets whisper and SD to quiet mode.
- Allow the SD StableUI to be run even when no SD model is loaded.
- Allow
--sdclamped
to provide a custom clamp size - Additional benchmark flags are saved (thanks @Nexesenex)
- Merged fixes and improvements from upstream
- Updated Kobold Lite:
- Fixed Whisper not working in some versions of Firefox
- Allow PTT to trigger a 'Generate More' if tapped, and still function as PTT if held.
- Fixed PWA functionality, now KoboldAI Lite can be installed as a web app even when running from KoboldCpp.
- Added a plaintext export option
- Increase retry history stack to 3.
- Increased default non-highres image size slightly.
Q: Why does Koboldcpp seem to constantly increase in filesize every single version?
A: Basically the upstream llama.cpp cuda maintainers believe that performance should always be prioritized over code size. Indeed, even the official llama.cpp libraries are now well over 130mb compressed without cublas runtimes, and continuing to grow in size at a geometric rate. Unfortunately, there is very little I can personally do about this.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.