What's new in 2.4.0 (2026-03-29)
These are the changes in inference v2.4.0.
New features
- FEAT: introducing OTEL by @leslie2046 in #4666
- FEAT: [UI] add Xagent link by @yiboyasss in #4693
- FEAT: [UI] remove featured/all toggle and prioritize featured models by @yiboyasss in #4694
- feat(vllm): support v0.18.0 by @llyycchhee in #4718
- FEAT: add gpu load metrics by @leslie2046 in #4712
- feat: Upgrade the base image to version 0.17.1 and add support for aarch64 version images by @zwt-1234 in #4726
- feat(ci): fix aarch64 build by @zwt-1234 in #4735
Enhancements
- ENH: update model "qwen3.5" JSON by @qinxuye in #4689
- ENH: update model "qwen3.5" JSON by @llyycchhee in #4707
- ENH: update models JSON [llm] by @XprobeBot in #4710
- ENH: update models JSON [llm] by @XprobeBot in #4713
- enh: adapt
normalizeparam of vllm>0.16.0 for embedding models. by @la1ty in #4729 - BLD: Requirements dependency version adjustment by @zwt-1234 in #4736
- bld: Requirements dependency version adjustment by @zwt-1234 in #4737
- bld: Requirements dependency version adjustment by @zwt-1234 in #4738
- REF: parallelize supervisor model registration listing by @leslie2046 in #4690
Bug fixes
- BUG: Fix async client FormData handling and response lifecycle issues by @qinxuye in #4687
- BUG: MLX backend accumulates intermediate generation steps into final output (tested on 1.17.0, 2.0.0, 2.1.0) #4615 by @nasircsms in #4617
- fix(worker): inject parent site-packages into child venv via .pth file by @nasircsms in #4692
- BUG: launch multi gpu qwen3.5 error by @llyycchhee in #4700
- fix(tool_call): add qwen3.5 by @llyycchhee in #4703
- fix(qwen3.5): support tool calls by @llyycchhee in #4709
- FIX: qwen3.5 reasoning parse by @llyycchhee in #4719
- fix(qwen3.5): support XML-like tool call format in non-streaming mode by @amumu96 in #4715
- FIX: webui crash when gpu_utilization is none by @leslie2046 in #4728
Documentation
New Contributors
- @nasircsms made their first contribution in #4617
- @octo-patch made their first contribution in #4704
- @la1ty made their first contribution in #4729
Full Changelog: v2.3.0...v2.4.0