What's new in 1.16.0 (2025-12-27)
These are the changes in inference v1.16.0.
New features
- FEAT: [model] DeepSeek-V3.2-Exp support by @Jun-Howie in #4374
- FEAT:Add vLLM backend support for DeepSeek-V3.2 by @Jun-Howie in #4377
- FEAT:Add vLLM backend support for DeepSeek-V3.2-Exp. by @Jun-Howie in #4375
- FEAT: vacc support by @ZhikaiGuo960110 in #4382
- FEAT: support vlm for vacc by @ZhikaiGuo960110 in #4385
- FEAT: [model] Fun-ASR-Nano-2512 support by @leslie2046 in #4397
- FEAT: [model] Qwen-Image-Layered support by @OliverBryant in #4395
- FEAT: [model] Fun-ASR-MLT-Nano-2512 support by @leslie2046 in #4398
- FEAT: continuous batching support for MLX chat models by @qinxuye in #4403
- FEAT: Add the architectures field for llm model launch by @OliverBryant in #4405
- FEAT: [UI] image models support configuration via environment variables and custom parameters. by @yiboyasss in #4413
- FEAT: support rerank async batch by @llyycchhee in #4414
- FEAT:Support VLLM backend for MiniMaxM2ForCausalLM by @Jun-Howie in #4412
Enhancements
- ENH: fix assigning replica to make gpu idxes assigned continuous by @ZhikaiGuo960110 in #4370
- ENH: update model "DeepSeek-V3.2" JSON by @Jun-Howie in #4381
- ENH: update model "glm-4.5" JSON by @OliverBryant in #4383
- ENH: update 2 models JSON ("glm-4.1v-thinking", "glm-4.5v") by @OliverBryant in #4384
- ENH: support torchaudio 2.9.0 by @llyycchhee in #4390
- ENH: update 3 models JSON ("llama-2-chat", "llama-3", "llama-3-instruct") by @OliverBryant in #4400
- ENH: update 4 models JSON ("llama-3.1", "llama-3.1-instruct", "llama-3.2-vision-instruct", ... +1 more) by @OliverBryant in #4401
- ENH: update model "jina-embeddings-v3" JSON by @XprobeBot in #4404
- ENH: update models JSON [audio, embedding, image, llm, video] by @XprobeBot in #4407
- ENH: update models JSON [audio, image] by @XprobeBot in #4408
- ENH: update model "Z-Image-Turbo" JSON by @OliverBryant in #4409
- ENH: update 2 models JSON ("DeepSeek-V3.2", "DeepSeek-V3.2-Exp") by @Jun-Howie in #4392
- ENH: update models JSON [llm] by @XprobeBot in #4415
- BLD: remove python 3.9 support by @OliverBryant in #4387
- BLD: Update Dockerfile to 12.9 to use VLLM v0.11.2 version by @zwt-1234 in #4393
Bug fixes
- BUG: fix PaddleOCR-VL output by @leslie2046 in #4368
- BUG: custom embedding and rerank model analysis error by @OliverBryant in #4367
- BUG: cannot launch model on cpu && multi workers launch error by @OliverBryant in #4361
- BUG: OCR API return is null && add doc for how to modify model_size by @OliverBryant in #4331
- BUG: fix n_gpu parameter by @OliverBryant in #4411
Documentation
Full Changelog: v1.15.0...v1.16.0