What's new in 2.5.0 (2026-04-12)
These are the changes in inference v2.5.0.
New features
- feat(sglang): support qwen3.5 by @llyycchhee in #4763
- FEAT: reconnect and reconstrcut model replicas after restart supervisor by @leslie2046 in #4731
- feat(audio): support qwen3-tts by @llyycchhee in #4781
- FEAT: [model] Qwen3-TTS-12Hz-1.7B-Base support by @llyycchhee in #4776
- FEAT: [model] Qwen3-TTS-12Hz-0.6B-Base support by @llyycchhee in #4777
- FEAT: [model] Qwen3-TTS-12Hz-1.7B-CustomVoice support by @llyycchhee in #4778
- FEAT: [model] Qwen3-TTS-12Hz-0.6B-CustomVoice support by @llyycchhee in #4779
- FEAT: [model] Qwen3-TTS-12Hz-1.7B-VoiceDesign support by @llyycchhee in #4780
- FEAT(webui): add localstorage management for model deploy configuration by @leslie2046 in #4739
- FEAT: [model] gemma-4 support by @qinxuye in #4768
Enhancements
- ENH: update model "DeepSeek-OCR" JSON by @amumu96 in #4751
- ENH: update 2 models JSON ("Ernie4.5", "qwen3.5") by @XprobeBot in #4754
- ENH: update model "DeepSeek-V3.2" JSON by @amumu96 in #4762
- ENH: update 2 models JSON ("Qwen3-ASR-0.6B", "Qwen3-ASR-1.7B") by @qinxuye in #4765
- ENH: auto-detect PyTorch CUDA version for virtual environment setup by @qinxuye in #4766
- ENH: update model "jina-embeddings-v4" JSON by @qinxuye in #4775
- ENH: Optimize worker details for deployment progress tooltip. by @leslie2046 in #4746
- ENH: update model "qwen3.5" JSON by @llyycchhee in #4782
- ENH: update 2 models JSON ("Kokoro-82M-v1.1-zh", "Kokoro-82M") by @qinxuye in #4795
- ENH: update model "gemma-3-it" JSON by @qinxuye in #4794
- ENH: update models JSON [llm] by @XprobeBot in #4796
- ENH: add lightweight heartbeat mechanism for worker liveness detection by @qinxuye in #4785
- ENH: update model "ChatTTS" JSON by @qinxuye in #4793
- bld: Fix the front-end UI access issue for aarch64 image by @zwt-1234 in #4743
- bld: Fix the front-end UI access issue for aarch64 image by @zwt-1234 in #4749
- bld: Fix the front-end UI access issue by @zwt-1234 in #4758
Bug fixes
- fix: use constant-time comparison for auth credentials (CWE-208) by @spidershield-contrib in #4734
- bug: fix qwen3 reranker vllm precision by @ZhikaiGuo960110 in #4747
- fix: add variable to control template for Qwen3 Reranker Family by @ZhikaiGuo960110 in #4752
- BUG: Fix Qwen3.5 wrong tag in streaming API by @la1ty in #4759
- BUG: Fix Jinja template error for models using {% break %} tag (e.g. ……Kimi K2.5) by @amumu96 in #4770
- BUG: fix qwen3-vl embedding model for vllm engine by @llyycchhee in #4783
Documentation
Others
- Fix #4597: [Bug] v2.0.0 Docker image: ImportError (circular import) a... by @JiwaniZakir in #4757
New Contributors
- @spidershield-contrib made their first contribution in #4734
- @JiwaniZakir made their first contribution in #4757
Full Changelog: v2.4.0...v2.5.0