github xorbitsai/inference v2.8.0

6 hours ago

What's new in 2.8.0 (2026-05-09)

These are the changes in inference v2.8.0.

New features

  • feat: TCP keepalive, reverse-channel probe, list_models cache, .pth atomic write, launch exemption by @m199369309 in #4845
  • feat: comprehensive Prometheus metrics for cluster, worker, and model observability by @m199369309 in #4857
  • feat: graceful OOM handling, worker model auto-recovery, and heartbeat log enhancement by @m199369309 in #4859
  • feat: integrate Grafana Dashboard monitoring into Web UI with Prometheus metrics enhancements by @m199369309 in #4868
  • feat: add modular vLLM post-install patch framework with hybrid KV cache fix by @m199369309 in #4879
  • feat: add Prometheus alert rules for cluster monitoring by @m199369309 in #4891
  • feat: move dashboard to monitor/ and update Grafana panels by @m199369309 in #4892

Enhancements

Bug fixes

  • fix: resolve Mixed Content blocking for Launch Web UI behind HTTPS proxy by @m199369309 in #4856
  • fix(vllm): support for v0.19.0 by @llyycchhee in #4862
  • fix: resolve host/model-venv transformers version conflict for embedding/rerank engines by @m199369309 in #4864
  • fix: correct PromQL operator precedence in Grafana dashboard panels by @m199369309 in #4873
  • fix: Fix vLLM /v1/completions TypeError in Qwen3.5 by @la1ty in #4874
  • fix: improve Grafana dashboard panel layout, PromQL expressions, and i18n by @m199369309 in #4876
  • fix: Qwen3.5 tool, ERROR Can't parse single qwen tool call output by @bleakie in #4870
  • fix: differentiate jina-embeddings v3/v4 task parameter handling by @m199369309 in #4881
  • fix: Set default enable-thinking option in command line to False by @la1ty in #4875
  • fix: upgrade TTFT metric to Histogram with stream label and add non-stream TTFT recording by @m199369309 in #4890

Documentation

New Contributors

Full Changelog: v2.7.0...v2.8.0

Don't miss a new inference release

NewReleases is sending notifications on new releases.