What's new in 2.4.0 (2026-03-29)

These are the changes in inference v2.4.0.

New features

FEAT: introducing OTEL by @leslie2046 in #4666
FEAT: [UI] add Xagent link by @yiboyasss in #4693
FEAT: [UI] remove featured/all toggle and prioritize featured models by @yiboyasss in #4694
feat(vllm): support v0.18.0 by @llyycchhee in #4718
FEAT: add gpu load metrics by @leslie2046 in #4712
feat: Upgrade the base image to version 0.17.1 and add support for aarch64 version images by @zwt-1234 in #4726
feat(ci): fix aarch64 build by @zwt-1234 in #4735

ENH: update model "qwen3.5" JSON by @qinxuye in #4689
ENH: update model "qwen3.5" JSON by @llyycchhee in #4707
ENH: update models JSON [llm] by @XprobeBot in #4710
ENH: update models JSON [llm] by @XprobeBot in #4713
enh: adapt normalize param of vllm>0.16.0 for embedding models. by @la1ty in #4729
BLD: Requirements dependency version adjustment by @zwt-1234 in #4736
bld: Requirements dependency version adjustment by @zwt-1234 in #4737
bld: Requirements dependency version adjustment by @zwt-1234 in #4738
REF: parallelize supervisor model registration listing by @leslie2046 in #4690

BUG: Fix async client FormData handling and response lifecycle issues by @qinxuye in #4687
BUG: MLX backend accumulates intermediate generation steps into final output (tested on 1.17.0, 2.0.0, 2.1.0) #4615 by @nasircsms in #4617
fix(worker): inject parent site-packages into child venv via .pth file by @nasircsms in #4692
BUG: launch multi gpu qwen3.5 error by @llyycchhee in #4700
fix(tool_call): add qwen3.5 by @llyycchhee in #4703
fix(qwen3.5): support tool calls by @llyycchhee in #4709
FIX: qwen3.5 reasoning parse by @llyycchhee in #4719
fix(qwen3.5): support XML-like tool call format in non-streaming mode by @amumu96 in #4715
FIX: webui crash when gpu_utilization is none by @leslie2046 in #4728

Full Changelog: v2.3.0...v2.4.0