koboldcpp-1.71

oh boy, another extra 30MB just for me? you shouldn't have!

Updated Kobold Lite:
- Corpo UI Theme is now available for chat mode as well.
- More accessibility label for screen readers.
- Enabling inject chatnames in Corpo UI now replaces the AI's displayed name if enabled.
- Added setting for TTS narration speed.
- Allow selecting the greeting message in Character Cards with multiple greetings
NEW: Automatic GPU layer selection has been improved, thanks to the efforts of @henk717 and @Pyroserenus. You can also now set --gpulayers to -1 to have KoboldCpp guess how many layers to be used. Note that this is still experimental, and the estimation may not be fully accurate, so you will still get better results manually selecting the GPU layers to use.
NEW: Added KoboldCpp Launch Templates. These are sharable .kcppt files that contain the setup necessary for other users to easily load and use your models. You can embed everything necessary to use a model within one file, including URLs to the desired model files, a preloaded story, and a chatcompletions adapter. Then anyone using that template can immediately get a properly configured model setup, with correct backend, threads, GPU layers, and formats ready to use on their own machine.
- For a demo, to run Llama3.1-8B, try this koboldcpp.exe --config https://huggingface.co/koboldcpp/kcppt/resolve/main/Llama-3.1-8B.kcppt , everything needed will be automatically downloaded and configured.
Fixed a crash when running a model with llava and debug mode enabled.
iq4_nl format support in Vulkan by @0cc4m
Updated embedded winclinfo for windows, other minor fixes
--unpack now does not include .pyd files as they were causing version conflicts.
Merged fixes and improvements from upstream, including Mistral Nemo support.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.71 koboldcpp-1.71 on GitHub

koboldcpp-1.71

LostRuins/koboldcpp v1.71
koboldcpp-1.71

on GitHub