What's new in 1.4.1 (2025-04-03)
These are the changes in inference v1.4.1.
New features
- FEAT: Support Fin-R1 model by @Jun-Howie in #3116
- FEAT: distributed inference for vLLM by @qinxuye in #3120
- FEAT: Support gptq(int4, int8) and fp8 for Fin-R1 model by @Jun-Howie in #3157
- feat: fix the quantization parameter in the vLLM engine cannot work by @amumu96 in #3159
- FEAT: sglang vision by @Minamiyama in #3150
- FEAT: support max_completion_tokens by @amumu96 in #3168
- FEAT: support DeepSeek-VL2 by @Jun-Howie in #3179
Enhancements
- ENH: support for qwen2.5-vl-32b by @Minamiyama in #3119
- ENH: sglang supports gptq int8 quantization now by @Minamiyama in #3149
- ENH: Add validation of n_worker by @rexjm in #3166
- ENH: add qwen2.5-vl-32b-awq supported, and fix 7b-awq download hub typo by @Minamiyama in #3169
- BLD: use gptqmodel to replace auto-gptq by @qinxuye in #3147
- BLD: resolve docker fail by @amumu96 in #3164
Bug fixes
- BUG: Fix PyTorch TypeError: Make _ModelWrapper Inherit from nn.Module by @JamesFlare1212 in #3131
- BUG: fix llm stream response by @amumu96 in #3115
- BUG: prevent potential stop hang for distributed vllm inference by @qinxuye in #3180
Documentation
New Contributors
- @JamesFlare1212 made their first contribution in #3131
- @rexjm made their first contribution in #3166
Full Changelog: v1.4.0...v1.4.1