What's new in 2.7.0 (2026-04-25)
These are the changes in inference v2.7.0.
New features
- FEAT: support replica removing by @leslie2046 in #4784
- feat(ui): optimize configuration cache dialog and terminology by @leslie2046 in #4807
- FEAT: add DeepSeek V3.2 tool parser for DSML format by @amumu96 in #4771
- feat: add Glm4MoeLiteForCausalLM support by @amumu96 in #4835
- feat: parallelize multi-replica terminate_model and improve UI delete UX by @m199369309 in #4825
- FEAT: [model] qwen3.6 support by @llyycchhee in #4831
- FEAT: [model] MiniMax-M2.7 support by @llyycchhee in #4843
- feat(tool_parser): add plain format support to DeepSeek V3.2 tool parser by @amumu96 in #4842
- FEAT: [model] glm-5.1 support by @llyycchhee in #4832
Enhancements
- ENH: update model "qwen3.5" JSON by @llyycchhee in #4801
- ENH: update model "DeepSeek-V3.2" JSON by @amumu96 in #4813
- ENH: update models JSON [embedding] by @XprobeBot in #4824
- ENH: update model "qwen3.5" JSON by @qinxuye in #4821
- ENH: update models JSON [embedding, rerank] by @XprobeBot in #4841
- ENH: update 2 models JSON ("MiniMax-M2.7", "glm-5.1") by @XprobeBot in #4848
- BLD: update xllamacpp to newest version for docker by @qinxuye in #4819
- BLD: Remove pre-release PyTorch installation from Dockerfile by @zwt-1234 in #4836
- BLD: remove torch related installation in aarch64 dockerfile by @zwt-1234 in #4840
- BLD: remove torchcodec installation in Dockerfile aarch64 by @zwt-1234 in #4844
- BLD: Modify docker build command for aarch64 image by @zwt-1234 in #4853
Bug fixes
- fix: replace eval() with safe alternatives to prevent RCE in tool parsers by @Ricardo-M-L in #4786
- fix: support JSON object parameters in CLI by @Ricardo-M-L in #4787
- fix: support Jina API task parameters for jina-embeddings-v4 by @Ricardo-M-L in #4788
- fix(ui): handle mixed dict and ChatMessage types in history by @qinxuye in #4814
- fix(vllm): fix gemma-4 tool calls by @llyycchhee in #4815
- fix(docker): unpin torchcodec to fix 503 error on reranker/embedding model load by @FlintyLemming in #4817
- fix: handle missing 'cpu' key in get_cluster_device_info to prevent KeyError 500 by @m199369309 in #4822
- fix: venv concurrent creation race, cold-start lock dir, and jina-embeddings-v4 torch mismatch by @m199369309 in #4823
- fix: dynamic CUDA version check for extra_index_url by @Gmgge in #4820
- fix: vLLM multi-node distributed init and pipeline parallel inference by @amumu96 in #4834
- fix: venv torchvision alignment, supervisor RPC timeouts, get_model flood protection, replica pre-check, and safe log handler by @m199369309 in #4839
- fix: remove last message role restriction in chat completion endpoint by @amumu96 in #4833
- fix(security): prevent pwn-request vulnerability in gen_docs workflow by @qinxuye in #4850
Documentation
Others
New Contributors
- @Ricardo-M-L made their first contribution in #4786
- @FlintyLemming made their first contribution in #4817
- @m199369309 made their first contribution in #4822
Full Changelog: v2.5.0...v2.7.0