kizuna-ai-lab/sokuji v0.25.0 on GitHub

What's new in v0.25.0

This release brings OpenAI's two newest realtime voice models to Sokuji.

✨ New: OpenAI Translate provider

Dedicated speech-to-speech translation via OpenAI's brand-new gpt-realtime-translate model.

This isn't another general realtime model with a "you are a translator" prompt — it's a purpose-built translation endpoint that streams translated audio continuously alongside synchronized source and target transcripts. No risk of the model interpreting your speech as a question and answering it instead of translating, no server-side turn detection to fight, and noticeably lower latency than chaining transcription + chat + TTS.

13 target languages · 75 source languages (auto-detected)
Both WebSocket and WebRTC transports
Push-to-Translate Speech Mode supported
Independent Source pause and Translation pause sliders (0.1s – 3.0s) for fine-tuning how the UI splits messages
Participant translation supported, with a warning when the selected source language falls outside the 13 model-supported targets

🚀 New realtime model: gpt-realtime-2

gpt-realtime-2 is now selectable for the OpenAI provider, bringing the new reasoning.effort parameter (minimal → low → medium → high → xhigh). Crank it up for high-stakes translation; dial it down for latency-sensitive conversation. Same model, configurable per session.

The transcription-only gpt-realtime-whisper is filtered out of the voice-agent model list — it doesn't fit the end-to-end speech-to-speech flow.

🌐 Localization

UI strings for the new OpenAI Translate provider added across all 30 supported locales.

Full Changelog: v0.24.0...v0.25.0