github xorbitsai/inference v2.4.0

10 hours ago

What's new in 2.4.0 (2026-03-29)

These are the changes in inference v2.4.0.

New features

Enhancements

Bug fixes

  • BUG: Fix async client FormData handling and response lifecycle issues by @qinxuye in #4687
  • BUG: MLX backend accumulates intermediate generation steps into final output (tested on 1.17.0, 2.0.0, 2.1.0) #4615 by @nasircsms in #4617
  • fix(worker): inject parent site-packages into child venv via .pth file by @nasircsms in #4692
  • BUG: launch multi gpu qwen3.5 error by @llyycchhee in #4700
  • fix(tool_call): add qwen3.5 by @llyycchhee in #4703
  • fix(qwen3.5): support tool calls by @llyycchhee in #4709
  • FIX: qwen3.5 reasoning parse by @llyycchhee in #4719
  • fix(qwen3.5): support XML-like tool call format in non-streaming mode by @amumu96 in #4715
  • FIX: webui crash when gpu_utilization is none by @leslie2046 in #4728

Documentation

New Contributors

Full Changelog: v2.3.0...v2.4.0

Don't miss a new inference release

NewReleases is sending notifications on new releases.