github xorbitsai/inference v2.11.0

4 hours ago

What's new in 2.11.0 (2026-06-19)

These are the changes in inference v2.11.0.

New features

Enhancements

Bug fixes

  • fix(supervisor): self-heal worker registry after supervisor restart by @m199369309 in #4998
  • fix(metrics): drop stale series and add unexpected termination gauge by @m199369309 in #4999
  • fix(worker): ensure subpool monitor starts after append_sub_pool by @m199369309 in #5004
  • fix(model_registration):add flexible type by @llyycchhee in #5007
  • fix(model): use os._exit for OOM to trigger pool recovery by @m199369309 in #5005
  • fix(venv): add sentence_transformers dependence by @llyycchhee in #5009
  • fix(supervisor): evict dead replica from round-robin when auto-recover exhausted by @m199369309 in #5006
  • fix(worker): pop launch_ts in recover_model to avoid strict-constructor crashes by @m199369309 in #5012
  • fix(deploy): fix dockerfile.cpu kernels version by @llyycchhee in #5013
  • fix(logs): fix empty node filter dropdown and redesign toolbar layout by @m199369309 in #5015
  • fix(monitor): correct alert rule labels/expressions and add 4 new rules by @m199369309 in #5016
  • fix: pin FastAPI below 0.137 for metrics middleware by @qinxuye in #5042
  • fix(core): sync model_serve_count gauge on stream request completion by @m199369309 in #5041
  • fix(vllm): force spawn for single-GPU EngineCore and fix health check scheduling by @m199369309 in #5030
  • fix(worker): strip test envs before recover and clean up GPU orphans by @m199369309 in #5031
  • fix(rerank/llama.cpp): honor flat launch kwargs (n_ctx/n_batch/n_ubatch) so rerankers aren't capped at default ubatch (#5001) by @Anai-Guo in #5046
  • fix(anthropic): honor top-level system prompt and inline system messages (#5037) by @Anai-Guo in #5049
  • fix(vllm/patches): auto-apply hybrid KV cache patch to Qwen3-Next by @Anai-Guo in #5052
  • fix(audio): warn and ignore non-zero temperature for Qwen3-ASR instead of raising by @Anai-Guo in #5054

Documentation

Others

New Contributors

Full Changelog: v2.10.0...v2.11.0

Don't miss a new inference release

NewReleases is sending notifications on new releases.