xorbitsai/inference v1.4.1
on GitHub

3 days ago

What's new in 1.4.1 (2025-04-03)

These are the changes in inference v1.4.1.

New features

FEAT: Support Fin-R1 model by @Jun-Howie in #3116
FEAT: distributed inference for vLLM by @qinxuye in #3120
FEAT: Support gptq(int4, int8) and fp8 for Fin-R1 model by @Jun-Howie in #3157
feat: fix the quantization parameter in the vLLM engine cannot work by @amumu96 in #3159
FEAT: sglang vision by @Minamiyama in #3150
FEAT: support max_completion_tokens by @amumu96 in #3168
FEAT: support DeepSeek-VL2 by @Jun-Howie in #3179

Enhancements

ENH: support for qwen2.5-vl-32b by @Minamiyama in #3119
ENH: sglang supports gptq int8 quantization now by @Minamiyama in #3149
ENH: Add validation of n_worker by @rexjm in #3166
ENH: add qwen2.5-vl-32b-awq supported, and fix 7b-awq download hub typo by @Minamiyama in #3169
BLD: use gptqmodel to replace auto-gptq by @qinxuye in #3147
BLD: resolve docker fail by @amumu96 in #3164

Bug fixes

BUG: Fix PyTorch TypeError: Make _ModelWrapper Inherit from nn.Module by @JamesFlare1212 in #3131
BUG: fix llm stream response by @amumu96 in #3115
BUG: prevent potential stop hang for distributed vllm inference by @qinxuye in #3180

Documentation

DOC: update models by @qinxuye in #3146

New Contributors

@JamesFlare1212 made their first contribution in #3131
@rexjm made their first contribution in #3166

Full Changelog: v1.4.0...v1.4.1

Check out latest releases or
releases around xorbitsai/inference v1.4.1

Don't miss a new inference release

NewReleases is sending notifications on new releases.

Get notifications