What's new in 2.5.0 (2026-04-12)

These are the changes in inference v2.5.0.

New features

feat(sglang): support qwen3.5 by @llyycchhee in #4763
FEAT: reconnect and reconstrcut model replicas after restart supervisor by @leslie2046 in #4731
feat(audio): support qwen3-tts by @llyycchhee in #4781
FEAT: [model] Qwen3-TTS-12Hz-1.7B-Base support by @llyycchhee in #4776
FEAT: [model] Qwen3-TTS-12Hz-0.6B-Base support by @llyycchhee in #4777
FEAT: [model] Qwen3-TTS-12Hz-1.7B-CustomVoice support by @llyycchhee in #4778
FEAT: [model] Qwen3-TTS-12Hz-0.6B-CustomVoice support by @llyycchhee in #4779
FEAT: [model] Qwen3-TTS-12Hz-1.7B-VoiceDesign support by @llyycchhee in #4780
FEAT(webui): add localstorage management for model deploy configuration by @leslie2046 in #4739
FEAT: [model] gemma-4 support by @qinxuye in #4768

ENH: update model "DeepSeek-OCR" JSON by @amumu96 in #4751
ENH: update 2 models JSON ("Ernie4.5", "qwen3.5") by @XprobeBot in #4754
ENH: update model "DeepSeek-V3.2" JSON by @amumu96 in #4762
ENH: update 2 models JSON ("Qwen3-ASR-0.6B", "Qwen3-ASR-1.7B") by @qinxuye in #4765
ENH: auto-detect PyTorch CUDA version for virtual environment setup by @qinxuye in #4766
ENH: update model "jina-embeddings-v4" JSON by @qinxuye in #4775
ENH: Optimize worker details for deployment progress tooltip. by @leslie2046 in #4746
ENH: update model "qwen3.5" JSON by @llyycchhee in #4782
ENH: update 2 models JSON ("Kokoro-82M-v1.1-zh", "Kokoro-82M") by @qinxuye in #4795
ENH: update model "gemma-3-it" JSON by @qinxuye in #4794
ENH: update models JSON [llm] by @XprobeBot in #4796
ENH: add lightweight heartbeat mechanism for worker liveness detection by @qinxuye in #4785
ENH: update model "ChatTTS" JSON by @qinxuye in #4793
bld: Fix the front-end UI access issue for aarch64 image by @zwt-1234 in #4743
bld: Fix the front-end UI access issue for aarch64 image by @zwt-1234 in #4749
bld: Fix the front-end UI access issue by @zwt-1234 in #4758

fix: use constant-time comparison for auth credentials (CWE-208) by @spidershield-contrib in #4734
bug: fix qwen3 reranker vllm precision by @ZhikaiGuo960110 in #4747
fix: add variable to control template for Qwen3 Reranker Family by @ZhikaiGuo960110 in #4752
BUG: Fix Qwen3.5 wrong tag in streaming API by @la1ty in #4759
BUG: Fix Jinja template error for models using {% break %} tag (e.g. ……Kimi K2.5) by @amumu96 in #4770
BUG: fix qwen3-vl embedding model for vllm engine by @llyycchhee in #4783

Fix #4597: [Bug] v2.0.0 Docker image: ImportError (circular import) a... by @JiwaniZakir in #4757

Full Changelog: v2.4.0...v2.5.0