What's new in 1.17.0 (2026-01-10)
These are the changes in inference v1.17.0.
New features
- FEAT: add enable_thinking kwarg support by @OliverBryant in #4423
- FEAT: Support MThreads (MUSA) GPU by @yeahdongcn in #4425
- FEAT: support distributed model launch for vllm version>=v0.11.0 by @OliverBryant in #4428
- FEAT: [model] Qwen-Image-Edit-2511 support by @OliverBryant in #4427
- FEAT: add minimax tool call support by @OliverBryant in #4434
- FEAT: [model] Qwen-Image-2512 support by @OliverBryant in #4435
- FEAT: support auto batch for sentence_transformers rerank by @llyycchhee in #4429
- FEAT: add multi engines for ocr && deepseek ocr mlx support by @OliverBryant in #4437
- FEAT: add fp4 support by @OliverBryant in #4450
- FEAT: add video gguf support by @OliverBryant in #4458
- FEAT: add multi engines for image model by @OliverBryant in #4446
Enhancements
- ENH: update 4 models JSON ("Deepseek-V3.1", "deepseek-r1-0528", "deepseek-r1-0528-qwen3", ... +1 more) by @OliverBryant in #4445
- ENH: update model "DeepSeek-OCR" JSON by @OliverBryant in #4444
- ENH: support vllm mtp & rope scaling by @ZhikaiGuo960110 in #4454
Bug fixes
- BUG: fix empty cache for vllm embedding & rerank by @ZhikaiGuo960110 in #4422
- BUG: Selecting the same worker repeatedly by @OliverBryant in #4447
- BUG: fix vllm ocr model cannot stop by @OliverBryant in #4460
- BUG: Models being downloaded cannot be canceled. by @OliverBryant in #4461
Documentation
- DOC: update new models and release notes for v1.16.0 by @qinxuye in #4416
- DOC: update docker docs by @qinxuye in #4419
- DOC: vLLM + Torch + Xinference Compatibility Issue by @qiulang in #4442
New Contributors
- @yeahdongcn made their first contribution in #4425
Full Changelog: v1.16.0...v1.17.0