What's new in 0.13.0 (2024-07-05)
These are the changes in inference v0.13.0.
New features
Enhancements
- ENH: added gguf files for qwen2 by @qinxuye in #1745
- ENH: Add more log modules by @ChengjieLi28 in #1771
- ENH: Continuous batching supports
vision
model ability by @ChengjieLi28 in #1724 - ENH: Add guard for model launching by @frostyplanet in #1680
- BLD: Supports Aliyun docker image by @ChengjieLi28 in #1753
- BLD: GPU docker use
vllm
image as base by @ChengjieLi28 in #1759 - BLD: Pin
llama-cpp-python
tov0.2.77
in Docker for stability by @ChengjieLi28 in #1767
Bug fixes
- BUG: Fix glm4 tool call by @codingl2k1 in #1747
- BUG: [UI] Fix authentication mode related bugs by @yiboyasss in #1772
- BUG: Fix python client returns documents for rerank task by default by @ChengjieLi28 in #1780
- BUG: Fix LLM based reranker may raise a TypeError by @codingl2k1 in #1794
- BUG: fix deepseek-vl-chat by @qinxuye in #1795
Tests
- TST: Fix
llama-cpp-python
issue in CI by @ChengjieLi28 in #1763
Documentation
- DOC: Update continuous batching and docker usage by @ChengjieLi28 in #1785
Full Changelog: v0.12.3...v0.13.0