What's new in 2.7.0 (2026-04-25)

These are the changes in inference v2.7.0.

New features

FEAT: support replica removing by @leslie2046 in #4784
feat(ui): optimize configuration cache dialog and terminology by @leslie2046 in #4807
FEAT: add DeepSeek V3.2 tool parser for DSML format by @amumu96 in #4771
feat: add Glm4MoeLiteForCausalLM support by @amumu96 in #4835
feat: parallelize multi-replica terminate_model and improve UI delete UX by @m199369309 in #4825
FEAT: [model] qwen3.6 support by @llyycchhee in #4831
FEAT: [model] MiniMax-M2.7 support by @llyycchhee in #4843
feat(tool_parser): add plain format support to DeepSeek V3.2 tool parser by @amumu96 in #4842
FEAT: [model] glm-5.1 support by @llyycchhee in #4832

ENH: update model "qwen3.5" JSON by @llyycchhee in #4801
ENH: update model "DeepSeek-V3.2" JSON by @amumu96 in #4813
ENH: update models JSON [embedding] by @XprobeBot in #4824
ENH: update model "qwen3.5" JSON by @qinxuye in #4821
ENH: update models JSON [embedding, rerank] by @XprobeBot in #4841
ENH: update 2 models JSON ("MiniMax-M2.7", "glm-5.1") by @XprobeBot in #4848
BLD: update xllamacpp to newest version for docker by @qinxuye in #4819
BLD: Remove pre-release PyTorch installation from Dockerfile by @zwt-1234 in #4836
BLD: remove torch related installation in aarch64 dockerfile by @zwt-1234 in #4840
BLD: remove torchcodec installation in Dockerfile aarch64 by @zwt-1234 in #4844
BLD: Modify docker build command for aarch64 image by @zwt-1234 in #4853

fix: replace eval() with safe alternatives to prevent RCE in tool parsers by @Ricardo-M-L in #4786
fix: support JSON object parameters in CLI by @Ricardo-M-L in #4787
fix: support Jina API task parameters for jina-embeddings-v4 by @Ricardo-M-L in #4788
fix(ui): handle mixed dict and ChatMessage types in history by @qinxuye in #4814
fix(vllm): fix gemma-4 tool calls by @llyycchhee in #4815
fix(docker): unpin torchcodec to fix 503 error on reranker/embedding model load by @FlintyLemming in #4817
fix: handle missing 'cpu' key in get_cluster_device_info to prevent KeyError 500 by @m199369309 in #4822
fix: venv concurrent creation race, cold-start lock dir, and jina-embeddings-v4 torch mismatch by @m199369309 in #4823
fix: dynamic CUDA version check for extra_index_url by @Gmgge in #4820
fix: vLLM multi-node distributed init and pipeline parallel inference by @amumu96 in #4834
fix: venv torchvision alignment, supervisor RPC timeouts, get_model flood protection, replica pre-check, and safe log handler by @m199369309 in #4839
fix: remove last message role restriction in chat completion endpoint by @amumu96 in #4833
fix(security): prevent pwn-request vulnerability in gen_docs workflow by @qinxuye in #4850

refactor(device_utils): replace if/elif chains with DeviceSpec registry by @amumu96 in #4846

Full Changelog: v2.5.0...v2.7.0