xorbitsai/inference v0.14.0
on GitHub

latest releases: v0.16.3, v0.16.2, v0.16.1...

3 months ago

What's new in 0.14.0 (2024-08-02)

These are the changes in inference v0.14.0.

New features

FEAT: Supports model_path input when launching models by @Valdanitooooo in #1918
FEAT: Support gte-Qwen2-7B-instruct and multi gpu deploy by @amumu96 in #1994

Enhancements

ENH: Add support of sglang for llama 3 qwen 2 by @luweizheng in #1947
ENH: add cache_limit_gb option for MLX by @qinxuye in #1954
ENH: [benchmark] Add api-key support by @frostyplanet in #1961
ENH: Support for Gemma 2 and Llama 3.1 Models for vllm & sglang by @vikrantrathore in #1929
ENH: [K8s] worker log dir name by @ChengjieLi28 in #1997
ENH: support image_to_image by @qinxuye in #1986
REF: enable sglang by default by @qinxuye in #1953

Bug fixes

BUG: Fix GLM chat by @codingl2k1 in #1966
BUG: fix match for transformers from model registered by @qinxuye in #1955
BUG: Load llama.so failed in docker image by @ChengjieLi28 in #1974
BUG: [UI]Modifying 'model format' again resulted in an error message. by @yiboyasss in #1990
BUG: fix loading multiple gguf parts by @qinxuye in #1987

Documentation

DOC: ascend support by @qinxuye in #1978
DOC: add CosyVoice doc by @qinxuye in #1980
DOC: Documents for K8s by @ChengjieLi28 in #2004

New Contributors

@vikrantrathore made their first contribution in #1929
@Valdanitooooo made their first contribution in #1918

Full Changelog: v0.13.3...v0.14.0

Check out latest releases or
releases around xorbitsai/inference v0.14.0

Don't miss a new inference release

NewReleases is sending notifications on new releases.

Get notifications