What's new in 0.15.0 (2024-09-06)
These are the changes in inference v0.15.0.
New features
- FEAT: cosyvoice model support streaming reply by @wuminghui-coder in #2192
- FEAT: support qwen2-vl-instruct by @Minamiyama in #2205
Enhancements
- ENH: include openai-whisper into thirdparty by @qinxuye in #2232
- ENH:
MiniCPM-V-2.6
Supports continuous batching with transformers engine by @ChengjieLi28 in #2238 - ENH: unpad for image2image/inpainting model by @wxiwnd in #2229
- ENH: Refine request log and add optional request_id by @frostyplanet in #2173
- REF: Use
chat_template
for LLM instead ofprompt_style
by @ChengjieLi28 in #2193
Bug fixes
- BUG: Fix docker image startup issue due to entrypoint by @ChengjieLi28 in #2207
- BUG: fix init xinference fail when custom path is fault by @amumu96 in #2208
- BUG: use
default_uid
to replaceuid
of actors which may override the xoscar actor's uid property by @qinxuye in #2214 - BUG: fix rerank max length by @qinxuye in #2219
- BUG: logger bug of function using generator decoration by @wxiwnd in #2215
- BUG: fix rerank calculation of tokens number by @qinxuye in #2228
- BUG: fix embedding token calculation & optimize memory by @qinxuye in #2221
Documentation
- DOC: Modify the installation documentation to change single quotes to double quotes for Windows compatibility. by @nikelius in #2211
Others
- Revert "EHN: clean cache for VL models (#2163)" by @qinxuye in #2230
- CHORE: Docker image is only pushed to aliyun when releasing version by @ChengjieLi28 in #2216
- CHORE: Compatible with
openai >= 1.40
by @ChengjieLi28 in #2231
New Contributors
- @nikelius made their first contribution in #2211
- @wuminghui-coder made their first contribution in #2192
Full Changelog: v0.14.4...v0.15.0