What's new in 2.8.0 (2026-05-09)
These are the changes in inference v2.8.0.
New features
- feat: TCP keepalive, reverse-channel probe, list_models cache, .pth atomic write, launch exemption by @m199369309 in #4845
- feat: comprehensive Prometheus metrics for cluster, worker, and model observability by @m199369309 in #4857
- feat: graceful OOM handling, worker model auto-recovery, and heartbeat log enhancement by @m199369309 in #4859
- feat: integrate Grafana Dashboard monitoring into Web UI with Prometheus metrics enhancements by @m199369309 in #4868
- feat: add modular vLLM post-install patch framework with hybrid KV cache fix by @m199369309 in #4879
- feat: add Prometheus alert rules for cluster monitoring by @m199369309 in #4891
- feat: move dashboard to monitor/ and update Grafana panels by @m199369309 in #4892
Enhancements
- ENH: update models JSON [embedding] by @XprobeBot in #4867
- ENH: update models JSON [llm] by @XprobeBot in #4871
- ENH: update model "GLM-4.6" JSON by @amumu96 in #4878
- bld: avoid libc6 upgrade during arm64 Docker build by @zwt-1234 in #4885
- bld: Restrict non-main release tags to admins by @qinxuye in #4887
- REF: refactor too parser check logic & fix qwen3.5 & qwen3.6 & gemma-4 reasoning parser by @llyycchhee in #4866
Bug fixes
- fix: resolve Mixed Content blocking for Launch Web UI behind HTTPS proxy by @m199369309 in #4856
- fix(vllm): support for v0.19.0 by @llyycchhee in #4862
- fix: resolve host/model-venv transformers version conflict for embedding/rerank engines by @m199369309 in #4864
- fix: correct PromQL operator precedence in Grafana dashboard panels by @m199369309 in #4873
- fix: Fix vLLM
/v1/completionsTypeError in Qwen3.5 by @la1ty in #4874 - fix: improve Grafana dashboard panel layout, PromQL expressions, and i18n by @m199369309 in #4876
- fix: Qwen3.5 tool, ERROR Can't parse single qwen tool call output by @bleakie in #4870
- fix: differentiate jina-embeddings v3/v4 task parameter handling by @m199369309 in #4881
- fix: Set default enable-thinking option in command line to False by @la1ty in #4875
- fix: upgrade TTFT metric to Histogram with stream label and add non-stream TTFT recording by @m199369309 in #4890
Documentation
New Contributors
Full Changelog: v2.7.0...v2.8.0