What's new in 0.14.0 (2024-08-02)
These are the changes in inference v0.14.0.
New features
- FEAT: Supports model_path input when launching models by @Valdanitooooo in #1918
- FEAT: Support gte-Qwen2-7B-instruct and multi gpu deploy by @amumu96 in #1994
Enhancements
- ENH: Add support of sglang for llama 3 qwen 2 by @luweizheng in #1947
- ENH: add cache_limit_gb option for MLX by @qinxuye in #1954
- ENH: [benchmark] Add api-key support by @frostyplanet in #1961
- ENH: Support for Gemma 2 and Llama 3.1 Models for vllm & sglang by @vikrantrathore in #1929
- ENH: [K8s] worker log dir name by @ChengjieLi28 in #1997
- ENH: support image_to_image by @qinxuye in #1986
- REF: enable sglang by default by @qinxuye in #1953
Bug fixes
- BUG: Fix GLM chat by @codingl2k1 in #1966
- BUG: fix match for transformers from model registered by @qinxuye in #1955
- BUG: Load llama.so failed in docker image by @ChengjieLi28 in #1974
- BUG: [UI]Modifying 'model format' again resulted in an error message. by @yiboyasss in #1990
- BUG: fix loading multiple gguf parts by @qinxuye in #1987
Documentation
- DOC: ascend support by @qinxuye in #1978
- DOC: add CosyVoice doc by @qinxuye in #1980
- DOC: Documents for K8s by @ChengjieLi28 in #2004
New Contributors
- @vikrantrathore made their first contribution in #1929
- @Valdanitooooo made their first contribution in #1918
Full Changelog: v0.13.3...v0.14.0