What's new in 1.6.1 (2025-05-30)

These are the changes in inference v1.6.1.

New features

FEAT: llama.cpp backend support multimodal by @codingl2k1 in #3442
FEAT: Auto ngl for llama.cpp backend by @codingl2k1 in #3518
FEAT: [UI] add hint for common parameters with support for custom input. by @yiboyasss in #3521
FEAT: add some other paraformer series models by @leslie2046 in #3536
FEAT: support Deepseek-R1-0528 by @Jun-Howie in #3539
FEAT: support deepseek-r1-0528-qwen3 by @Jun-Howie in #3552

ENH: [rerank] add instruction for minicpm-reranker by @llyycchhee in #3453
ENH: pass extra arguments for speech2text API. by @leslie2046 in #3516
ENH: add modelscope support for kolors by @qinxuye in #3534
ENH: remove check when specified GPU index for vllm by @kota-iizuka in #3527
ENH: Supports HybridCache in transformers lib, mainly for gemma3 chat model by @ChengjieLi28 in #3538
ENH: support virtualenv for chattts by @qinxuye in #3541
BLD: fix setup.cfg by @qinxuye in #3467
BLD: update flashinfer version by @amumu96 in #3549
REF: Refactor for multimodal llm models by @ChengjieLi28 in #3462

DOC: remove llama-cpp-python related doc & refine model_ability parts by @qinxuye in #3519
DOC: Update doc about cosyvoice-2.0 stream and auto NGL by @codingl2k1 in #3547

Full Changelog: v1.6.0...v1.6.1