What's new in 1.6.1 (2025-05-30)
These are the changes in inference v1.6.1.
New features
- FEAT: llama.cpp backend support multimodal by @codingl2k1 in #3442
- FEAT: Auto ngl for llama.cpp backend by @codingl2k1 in #3518
- FEAT: [UI] add hint for common parameters with support for custom input. by @yiboyasss in #3521
- FEAT: add some other paraformer series models by @leslie2046 in #3536
- FEAT: support Deepseek-R1-0528 by @Jun-Howie in #3539
- FEAT: support deepseek-r1-0528-qwen3 by @Jun-Howie in #3552
Enhancements
- ENH: [rerank] add instruction for minicpm-reranker by @llyycchhee in #3453
- ENH: pass extra arguments for speech2text API. by @leslie2046 in #3516
- ENH: add modelscope support for kolors by @qinxuye in #3534
- ENH: remove check when specified GPU index for vllm by @kota-iizuka in #3527
- ENH: Supports
HybridCache
intransformers
lib, mainly forgemma3
chat model by @ChengjieLi28 in #3538 - ENH: support virtualenv for chattts by @qinxuye in #3541
- BLD: fix setup.cfg by @qinxuye in #3467
- BLD: update flashinfer version by @amumu96 in #3549
- REF: Refactor for multimodal llm models by @ChengjieLi28 in #3462
Bug fixes
- BUG: fix input for jina clip by @llyycchhee in #3440
- BUG: [ui] delete cache file white screen bug. by @yiboyasss in #3482
- BUG: fix import_submodules, ignore test files by @Gmgge in #3545
Documentation
- DOC: remove llama-cpp-python related doc & refine model_ability parts by @qinxuye in #3519
- DOC: Update doc about cosyvoice-2.0 stream and auto NGL by @codingl2k1 in #3547
New Contributors
- @kota-iizuka made their first contribution in #3527
Full Changelog: v1.6.0...v1.6.1