What's new in 1.9.1 (2025-08-30)

These are the changes in inference v1.9.1.

New features

FEAT: Qwen-Image-Edit by @qinxuye in #3989
FEAT: Wan 2.2 by @qinxuye in #3996
FEAT: Update CosyVoice2 to support both streaming and non-streaming speech generation by @Gmgge in #3994
FEAT: support qwen-image-lightning by @qinxuye in #3995
FEAT: [UI] support gpu_count configuration in image model. by @yiboyasss in #4016
FEAT: image2image and inpainting for qwen-image by @qinxuye in #4014
FEAT: Support Custom vllm embedding dim by @zhcn000000 in #4000
FEAT: [embedding] support dimensions for embedding by @llyycchhee in #3965
FEAT: [Model] Support DeepSeek-V3.1 Quantization and tool by @Jun-Howie in #4022
FEAT: Seed-OSS-36B by @Jun-Howie in #4020

ENH: added zero shot and voice cloning ability for audio models by @qianduoduo0904 in #3968
ENH: Add Template for Qwen3 Reranker when model_engine = vllm by @zhcn000000 in #3983
ENH: Update the environment dependencies for cosyvoice2 by @Gmgge in #4015
ENH: Compat with xllamacpp 0.2.0 by @codingl2k1 in #4004
ENH: support chat_template_kwargs for llama.cpp by @qinxuye in #3988
BLD: Clean up Docker's last legacy cache and images before executing each step by @zwt-1234 in #3963
BLD: fix CI failures by @qinxuye in #4002

BUG: disable flash_attention when GPU compute capability < 8.0 by @amumu96 in #3973
BUG: fix rerank model creation by @qinxuye in #3977

Full Changelog: v1.9.0...v1.9.1