koboldcpp-1.75

Nothing lasts forever edition

Important: When running from command line, if no backend was explicitly selected (--use...), a GPU backend is now auto selected by default if available. This can be overridden by picking a specific backend (eg. --usecpu, --usevulkan, --usecublas). As a result, dragging and dropping a gguf model onto the koboldcpp.exe executable will allow it to be launched with GPU and gpulayers auto configured.
Important: OpenBLAS backend has been removed, and unified with the NoBLAS backend, to form a single Use CPU option. This utilizes the sgemm functionality that llamafile upstreamed, so processing speeds should still be comparable. --noblas flag is also deprecated, instead CPU Mode can be enabled with the --usecpu flag.
Added support for RWKV v6 models (context shifting not supported)
Added a new flag --showgui that allows the GUI to be shown even with command line flags are used. Instead, command line flags will get imported into the GUI itself, allowing them to be modified. This also works with .kcpps config files,
Added a warning display when loading legacy GGML models
Fix for DRY sampler occasionally segfaulting on bad unicode input.
Embedded Horde workers now work with password protected instances.
Updated Kobold Lite, multiple fixes and improvements
- Added first-start welcome screen, to pick a starting UI Theme
- Added support for OpenAI-Compatible TTS endpoints
- Added a preview option for alternate greetings within a V2 Tavern character card.
- Now works with Kobold API backends with gated model lists e.g. Tabby
- Added display-only regex replacement, allowing you to hide or replace displayed text while keeping the original used with the AI in context.
- Added a new Instruct scenario to mimic CoT Reflection (Thinking)
- Sampler presets now reset seed, but no longer reset generation amount setting.
- Markdown parser fixes
- Added system role for Metharme instruct format
- Added a toggle for chat name format matching, allowing matching any name or only predefined names.
- Fixed markdown image scaling
Merged fixes and improvements from upstream

Known issues: Auto backend selection and clblast is not working correctly, a hotfix is being prepared to resolve this.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.

LostRuins/koboldcpp v1.75 koboldcpp-1.75 on GitHub

koboldcpp-1.75

LostRuins/koboldcpp v1.75
koboldcpp-1.75

on GitHub