What's new in 1.16.0 (2025-12-27)

These are the changes in inference v1.16.0.

New features

FEAT: [model] DeepSeek-V3.2-Exp support by @Jun-Howie in #4374
FEAT:Add vLLM backend support for DeepSeek-V3.2 by @Jun-Howie in #4377
FEAT:Add vLLM backend support for DeepSeek-V3.2-Exp. by @Jun-Howie in #4375
FEAT: vacc support by @ZhikaiGuo960110 in #4382
FEAT: support vlm for vacc by @ZhikaiGuo960110 in #4385
FEAT: [model] Fun-ASR-Nano-2512 support by @leslie2046 in #4397
FEAT: [model] Qwen-Image-Layered support by @OliverBryant in #4395
FEAT: [model] Fun-ASR-MLT-Nano-2512 support by @leslie2046 in #4398
FEAT: continuous batching support for MLX chat models by @qinxuye in #4403
FEAT: Add the architectures field for llm model launch by @OliverBryant in #4405
FEAT: [UI] image models support configuration via environment variables and custom parameters. by @yiboyasss in #4413
FEAT: support rerank async batch by @llyycchhee in #4414
FEAT:Support VLLM backend for MiniMaxM2ForCausalLM by @Jun-Howie in #4412

ENH: fix assigning replica to make gpu idxes assigned continuous by @ZhikaiGuo960110 in #4370
ENH: update model "DeepSeek-V3.2" JSON by @Jun-Howie in #4381
ENH: update model "glm-4.5" JSON by @OliverBryant in #4383
ENH: update 2 models JSON ("glm-4.1v-thinking", "glm-4.5v") by @OliverBryant in #4384
ENH: support torchaudio 2.9.0 by @llyycchhee in #4390
ENH: update 3 models JSON ("llama-2-chat", "llama-3", "llama-3-instruct") by @OliverBryant in #4400
ENH: update 4 models JSON ("llama-3.1", "llama-3.1-instruct", "llama-3.2-vision-instruct", ... +1 more) by @OliverBryant in #4401
ENH: update model "jina-embeddings-v3" JSON by @XprobeBot in #4404
ENH: update models JSON [audio, embedding, image, llm, video] by @XprobeBot in #4407
ENH: update models JSON [audio, image] by @XprobeBot in #4408
ENH: update model "Z-Image-Turbo" JSON by @OliverBryant in #4409
ENH: update 2 models JSON ("DeepSeek-V3.2", "DeepSeek-V3.2-Exp") by @Jun-Howie in #4392
ENH: update models JSON [llm] by @XprobeBot in #4415
BLD: remove python 3.9 support by @OliverBryant in #4387
BLD: Update Dockerfile to 12.9 to use VLLM v0.11.2 version by @zwt-1234 in #4393

BUG: fix PaddleOCR-VL output by @leslie2046 in #4368
BUG: custom embedding and rerank model analysis error by @OliverBryant in #4367
BUG: cannot launch model on cpu && multi workers launch error by @OliverBryant in #4361
BUG: OCR API return is null && add doc for how to modify model_size by @OliverBryant in #4331
BUG: fix n_gpu parameter by @OliverBryant in #4411

Full Changelog: v1.15.0...v1.16.0