What's new in 2.11.0 (2026-06-19)

These are the changes in inference v2.11.0.

New features

feat: new UI part 3 by @maoyuehui in #5011
feat: add MiniCPM5-1B model support by @xiaoyesoso in #5010
feat(ui): add change password for local users in user management by @m199369309 in #5014
feat: add jina-embeddings-v5 series support by @xiaoyesoso in #5018
feat: add Gradio test page for embedding models by @xiaoyesoso in #5022
feat: add MiniCPM-V-4.6 series support by @xiaoyesoso in #5025
feat: add Tencent Hy-MT2 series support by @xiaoyesoso in #5029
feat(vllm): support glm5 tool parser by @llyycchhee in #5019
feat(core): add model loading state machine to prevent routing to loading models by @m199369309 in #5032
feat(audio): add VoxCPM2 model support by @bluefish-08 in #5045
feat: new UI part 4 by @maoyuehui in #5040
FEAT: [model] PaddleOCR-VL-1.6 support by @llyycchhee in #5033
feat: add random serving benchmark workload by @qinxuye in #5036
feat: new UI part 5 by @maoyuehui in #5053

Enhancements

ENH: update models JSON [llm] by @XprobeBot in #5017
ENH: update models JSON [embedding] by @XprobeBot in #5020
ENH: update models JSON [llm] by @XprobeBot in #5026
ENH: update models JSON [llm] by @XprobeBot in #5034
ENH: update models JSON [llm] by @XprobeBot in #5038
ENH: update models JSON [audio] by @XprobeBot in #5050
bld: Replace self-hosted runner with GitHub hosted runner and update actions by @qinrui777 in #5000

Bug fixes

fix(supervisor): self-heal worker registry after supervisor restart by @m199369309 in #4998
fix(metrics): drop stale series and add unexpected termination gauge by @m199369309 in #4999
fix(worker): ensure subpool monitor starts after append_sub_pool by @m199369309 in #5004
fix(model_registration):add flexible type by @llyycchhee in #5007
fix(model): use os._exit for OOM to trigger pool recovery by @m199369309 in #5005
fix(venv): add sentence_transformers dependence by @llyycchhee in #5009
fix(supervisor): evict dead replica from round-robin when auto-recover exhausted by @m199369309 in #5006
fix(worker): pop launch_ts in recover_model to avoid strict-constructor crashes by @m199369309 in #5012
fix(deploy): fix dockerfile.cpu kernels version by @llyycchhee in #5013
fix(logs): fix empty node filter dropdown and redesign toolbar layout by @m199369309 in #5015
fix(monitor): correct alert rule labels/expressions and add 4 new rules by @m199369309 in #5016
fix: pin FastAPI below 0.137 for metrics middleware by @qinxuye in #5042
fix(core): sync model_serve_count gauge on stream request completion by @m199369309 in #5041
fix(vllm): force spawn for single-GPU EngineCore and fix health check scheduling by @m199369309 in #5030
fix(worker): strip test envs before recover and clean up GPU orphans by @m199369309 in #5031
fix(rerank/llama.cpp): honor flat launch kwargs (n_ctx/n_batch/n_ubatch) so rerankers aren't capped at default ubatch (#5001) by @Anai-Guo in #5046
fix(anthropic): honor top-level system prompt and inline system messages (#5037) by @Anai-Guo in #5049
fix(vllm/patches): auto-apply hybrid KV cache patch to Qwen3-Next by @Anai-Guo in #5052
fix(audio): warn and ignore non-zero temperature for Qwen3-ASR instead of raising by @Anai-Guo in #5054

Documentation

doc: add v2.10.0 release notes by @qinxuye in #4997
doc: hide Zhihu link from English docs by @qinxuye in #5008
doc: update documentation logo and favicon by @qinxuye in #5024
doc: update Read the Docs build image by @qinxuye in #5044

Others

docs: add Telegram community links by @qinxuye in #5003

New Contributors

@qinrui777 made their first contribution in #5000
@xiaoyesoso made their first contribution in #5010
@bluefish-08 made their first contribution in #5045
@Anai-Guo made their first contribution in #5046

Full Changelog: v2.10.0...v2.11.0

xorbitsai/inference v2.11.0 on GitHub

What's new in 2.11.0 (2026-06-19)

New features

Enhancements

Bug fixes

Documentation

Others

New Contributors

xorbitsai/inference v2.11.0
on GitHub