What's new in 1.9.1 (2025-08-30)
These are the changes in inference v1.9.1.
New features
- FEAT: Qwen-Image-Edit by @qinxuye in #3989
- FEAT: Wan 2.2 by @qinxuye in #3996
- FEAT: Update CosyVoice2 to support both streaming and non-streaming speech generation by @Gmgge in #3994
- FEAT: support qwen-image-lightning by @qinxuye in #3995
- FEAT: [UI] support gpu_count configuration in image model. by @yiboyasss in #4016
- FEAT: image2image and inpainting for qwen-image by @qinxuye in #4014
- FEAT: Support Custom vllm embedding dim by @zhcn000000 in #4000
- FEAT: [embedding] support
dimensions
for embedding by @llyycchhee in #3965 - FEAT: [Model] Support DeepSeek-V3.1 Quantization and tool by @Jun-Howie in #4022
- FEAT: Seed-OSS-36B by @Jun-Howie in #4020
Enhancements
- ENH: added zero shot and voice cloning ability for audio models by @qianduoduo0904 in #3968
- ENH: Add Template for Qwen3 Reranker when model_engine = vllm by @zhcn000000 in #3983
- ENH: Update the environment dependencies for cosyvoice2 by @Gmgge in #4015
- ENH: Compat with xllamacpp 0.2.0 by @codingl2k1 in #4004
- ENH: support chat_template_kwargs for llama.cpp by @qinxuye in #3988
- BLD: Clean up Docker's last legacy cache and images before executing each step by @zwt-1234 in #3963
- BLD: fix CI failures by @qinxuye in #4002
Bug fixes
- BUG: disable flash_attention when GPU compute capability < 8.0 by @amumu96 in #3973
- BUG: fix rerank model creation by @qinxuye in #3977
Documentation
- DOC: update models by @qinxuye in #3958
- DOC: add setting limitation of images for multi modal doc by @amumu96 in #4003
- DOC: Update docs about custom models by @OliverBryant in #4019
- DOC: update models & README by @qinxuye in #4023
Others
- FEAT:KAT-V1 by @Jun-Howie in #3998
New Contributors
- @qianduoduo0904 made their first contribution in #3968
- @OliverBryant made their first contribution in #4019
Full Changelog: v1.9.0...v1.9.1