What's new in 0.12.1 (2024-06-14)
These are the changes in inference v0.12.1.
New features
- FEAT: qwen2-instruct support tool call by @ayhhyhh in #1631
- FEAT: Added a method to download models from csghub. by @hainaweiben in #1627
- FEAT: glm4-chat support tool call by @codingl2k1 in #1617
- FEAT: [UI] Supports viewing and deleting cache data. by @yiboyasss in #1637
Enhancements
- ENH: modelscope for audio models by @Minamiyama in #1607
- ENH: Supports
generate
interface for continuous batching by @ChengjieLi28 in #1621 - ENH: quantization for glm-4v by @Minamiyama in #1610
Bug fixes
- BUG: Fix wheel package missing thirdparty ChatTTS by @codingl2k1 in #1606
- BUG: fix XINFERENCE_MODEL_SRC behavior by @LukeWang-Plus in #1616
- BUG: Filtering Step for Streaming Responses to Qwen's Tool Calls when using vLLM by @zhanghx0905 in #1598
Others
- Remove selected cache models by @hainaweiben in #1613
New Contributors
- @LukeWang-Plus made their first contribution in #1616
- @ayhhyhh made their first contribution in #1631
Full Changelog: v0.12.0...v0.12.1