github xorbitsai/inference v2.7.0

10 hours ago

What's new in 2.7.0 (2026-04-25)

These are the changes in inference v2.7.0.

New features

Enhancements

Bug fixes

  • fix: replace eval() with safe alternatives to prevent RCE in tool parsers by @Ricardo-M-L in #4786
  • fix: support JSON object parameters in CLI by @Ricardo-M-L in #4787
  • fix: support Jina API task parameters for jina-embeddings-v4 by @Ricardo-M-L in #4788
  • fix(ui): handle mixed dict and ChatMessage types in history by @qinxuye in #4814
  • fix(vllm): fix gemma-4 tool calls by @llyycchhee in #4815
  • fix(docker): unpin torchcodec to fix 503 error on reranker/embedding model load by @FlintyLemming in #4817
  • fix: handle missing 'cpu' key in get_cluster_device_info to prevent KeyError 500 by @m199369309 in #4822
  • fix: venv concurrent creation race, cold-start lock dir, and jina-embeddings-v4 torch mismatch by @m199369309 in #4823
  • fix: dynamic CUDA version check for extra_index_url by @Gmgge in #4820
  • fix: vLLM multi-node distributed init and pipeline parallel inference by @amumu96 in #4834
  • fix: venv torchvision alignment, supervisor RPC timeouts, get_model flood protection, replica pre-check, and safe log handler by @m199369309 in #4839
  • fix: remove last message role restriction in chat completion endpoint by @amumu96 in #4833
  • fix(security): prevent pwn-request vulnerability in gen_docs workflow by @qinxuye in #4850

Documentation

Others

  • refactor(device_utils): replace if/elif chains with DeviceSpec registry by @amumu96 in #4846

New Contributors

Full Changelog: v2.5.0...v2.7.0

Don't miss a new inference release

NewReleases is sending notifications on new releases.