What's new in 2.11.0 (2026-06-19)
These are the changes in inference v2.11.0.
New features
- feat: new UI part 3 by @maoyuehui in #5011
- feat: add MiniCPM5-1B model support by @xiaoyesoso in #5010
- feat(ui): add change password for local users in user management by @m199369309 in #5014
- feat: add jina-embeddings-v5 series support by @xiaoyesoso in #5018
- feat: add Gradio test page for embedding models by @xiaoyesoso in #5022
- feat: add MiniCPM-V-4.6 series support by @xiaoyesoso in #5025
- feat: add Tencent Hy-MT2 series support by @xiaoyesoso in #5029
- feat(vllm): support glm5 tool parser by @llyycchhee in #5019
- feat(core): add model loading state machine to prevent routing to loading models by @m199369309 in #5032
- feat(audio): add VoxCPM2 model support by @bluefish-08 in #5045
- feat: new UI part 4 by @maoyuehui in #5040
- FEAT: [model] PaddleOCR-VL-1.6 support by @llyycchhee in #5033
- feat: add random serving benchmark workload by @qinxuye in #5036
- feat: new UI part 5 by @maoyuehui in #5053
Enhancements
- ENH: update models JSON [llm] by @XprobeBot in #5017
- ENH: update models JSON [embedding] by @XprobeBot in #5020
- ENH: update models JSON [llm] by @XprobeBot in #5026
- ENH: update models JSON [llm] by @XprobeBot in #5034
- ENH: update models JSON [llm] by @XprobeBot in #5038
- ENH: update models JSON [audio] by @XprobeBot in #5050
- bld: Replace self-hosted runner with GitHub hosted runner and update actions by @qinrui777 in #5000
Bug fixes
- fix(supervisor): self-heal worker registry after supervisor restart by @m199369309 in #4998
- fix(metrics): drop stale series and add unexpected termination gauge by @m199369309 in #4999
- fix(worker): ensure subpool monitor starts after append_sub_pool by @m199369309 in #5004
- fix(model_registration):add flexible type by @llyycchhee in #5007
- fix(model): use os._exit for OOM to trigger pool recovery by @m199369309 in #5005
- fix(venv): add sentence_transformers dependence by @llyycchhee in #5009
- fix(supervisor): evict dead replica from round-robin when auto-recover exhausted by @m199369309 in #5006
- fix(worker): pop launch_ts in recover_model to avoid strict-constructor crashes by @m199369309 in #5012
- fix(deploy): fix dockerfile.cpu kernels version by @llyycchhee in #5013
- fix(logs): fix empty node filter dropdown and redesign toolbar layout by @m199369309 in #5015
- fix(monitor): correct alert rule labels/expressions and add 4 new rules by @m199369309 in #5016
- fix: pin FastAPI below 0.137 for metrics middleware by @qinxuye in #5042
- fix(core): sync model_serve_count gauge on stream request completion by @m199369309 in #5041
- fix(vllm): force spawn for single-GPU EngineCore and fix health check scheduling by @m199369309 in #5030
- fix(worker): strip test envs before recover and clean up GPU orphans by @m199369309 in #5031
- fix(rerank/llama.cpp): honor flat launch kwargs (n_ctx/n_batch/n_ubatch) so rerankers aren't capped at default ubatch (#5001) by @Anai-Guo in #5046
- fix(anthropic): honor top-level system prompt and inline system messages (#5037) by @Anai-Guo in #5049
- fix(vllm/patches): auto-apply hybrid KV cache patch to Qwen3-Next by @Anai-Guo in #5052
- fix(audio): warn and ignore non-zero temperature for Qwen3-ASR instead of raising by @Anai-Guo in #5054
Documentation
- doc: add v2.10.0 release notes by @qinxuye in #4997
- doc: hide Zhihu link from English docs by @qinxuye in #5008
- doc: update documentation logo and favicon by @qinxuye in #5024
- doc: update Read the Docs build image by @qinxuye in #5044
Others
New Contributors
- @qinrui777 made their first contribution in #5000
- @xiaoyesoso made their first contribution in #5010
- @bluefish-08 made their first contribution in #5045
- @Anai-Guo made their first contribution in #5046
Full Changelog: v2.10.0...v2.11.0