koboldcpp-1.67
Hands free edition
KoboldLostAgain.mp4
- NEW: Integrated Whisper.cpp into KoboldCpp. This can be used from Kobold Lite for speech-to-text (see below). You can obtain a whisper model from the whisper.cpp repo links or download one mirrored here
- Two new endpoints are added,
/api/extra/transcribe
used by KoboldCpp and the OpenAI compatible drop-in/v1/audio/transcriptions
. Both endpoints accept payloads as .wav files (max 32MB), or base64 encoded wave data, please check KoboldCpp API docs for more info. - Can be used in Kobold Lite. Uses microphone when enabled in settings panel. You can use Push-To-Talk (PTT) or automatic Voice Activity Detection (VAD) aka Hands Free Mode, everything runs locally within your browser including resampling and wav format conversion, and interfaces directly with the KoboldCpp transcription endpoint.
- Special thanks to @ggerganov and all the developers of whisper.cpp, without which none of this would have been possible.
- Two new endpoints are added,
- NEW: You can now utilize the Quantized KV Cache feature in KoboldCpp with
--quantkv [level]
, wherelevel 0=f16, 1=q8, 2=q4
. Note that quantized KV cache is only available if--flashattention
is used, and is NOT compatible with Context Shifting, which will be disabled if--quantkv
is used. - Merged improvements and fixes from upstream, including new MOE support for Vulkan by @0cc4m
- Fixed a bug with stable diffusion generating blank images in CPU mode.
- Updated Kobold Lite:
- Speech-To-Text features have been added, see above.
- Tavern Cards can now be imported in Instruct mode. Enable "Show Advanced Load" for this option.
- Logit Bias editor now has a built-in tokenizer for strings when using with koboldcpp.
- Fixed world info trigger probability, added escape button to close popups, fixed Cohere preamble dialog, fixed password input field sizes, various other bugfixes.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.