What's new in 2.10.0 (2026-06-05)
These are the changes in inference v2.10.0.
New features
- feat(logging): enhance logging system with JSON format and stdout redirect by @m199369309 in #4947
- feat(auth): add OIDC/Keycloak SSO authentication support by @m199369309 in #4948
- feat(audit): add comprehensive audit logging system by @m199369309 in #4951
- feat(security): add IP/Key ban and rate limiting for brute-force protection by @m199369309 in #4949
- feat(apikey): add description field and ban status display by @m199369309 in #4952
- feat(monitor): add security audit panels and filebeat configurations by @m199369309 in #4953
- feat(ui): menu reorganization, fetchWrapper auth, and i18n updates by @m199369309 in #4954
- feat(monitor): add per-model GPU memory usage metrics by @m199369309 in #4965
- feat(monitor): update Grafana dashboards with GPU memory panels by @m199369309 in #4969
- feat: persist launch model configuration history server-side by @m199369309 in #4972
- FEAT: [UI] update sidebar, login logo and favicon by @yiboyasss in #4978
- feat: new ui (register json view, formInstance, launch model list, cache/env …) by @maoyuehui in #4966
- feat(logging): add three-level download progress logging by @m199369309 in #4989
- feat(ui): allow editing API key name and description in edit dialog by @m199369309 in #4991
Enhancements
- ENH: update model "qwen3.6" JSON by @llyycchhee in #4945
- ENH: update model "ChatTTS" JSON by @llyycchhee in #4961
- ENH: update models JSON [embedding] by @XprobeBot in #4971
- ENH: update model "qwen3.6" JSON by @llyycchhee in #4994
Bug fixes
- fix(vllm): set quantization="fp8" when model_format is fp8 by @m199369309 in #4959
- fix(auth): return specific error messages for expired/disabled API keys by @m199369309 in #4963
- fix(monitor): periodic refresh for security gauges and ban remaining API by @m199369309 in #4964
- BUG: fix jina-embeddings-v2-base-zh deployment dependencies by @m199369309 in #4970
- fix(monitor): capture vLLM/SGLang GPU workers via deferred PID tattoo by @m199369309 in #4977
- fix(vllm): remove best_of for v0.21.0 by @llyycchhee in #4979
- bug: Adapt vLLM LoRA request path parameter by @amumu96 in #4980
- fix(logging): strip all CSI escapes and route flush() through sampling by @m199369309 in #4983
- fix(vllm): read json_schema from
schema_so guided decoding applies by @m199369309 in #4985 - bug: fix llama.cpp streaming tool call edge cases by @qinxuye in #4988
- fix: Fix GPU info probe on GB10 / DGX Spark (NVML v2 memory-info fallback) by @tbraun96 in #4990
- fix(worker): limit concurrent model launches with semaphore to prevent heartbeat timeouts by @m199369309 in #4992
- fix(venv): allow user override by @llyycchhee in #4993
- fix(venv): evaluate CUDA version markers dynamically by @llyycchhee in #4958
Documentation
- DOC: add v2.9.0 release notes by @qinxuye in #4941
- doc: Add AI agent guidance for the project by @qinxuye in #4920
- doc: remove WeChat QR links from docs site by @qinxuye in #4967
Others
- chattts: set weights_only=True for torch.load speaker embedding by @tonghuaroot in #4956
New Contributors
- @tonghuaroot made their first contribution in #4956
- @tbraun96 made their first contribution in #4990
Full Changelog: v2.9.0...v2.10.0