What's new in 0.9.3 (2024-03-15)

These are the changes in inference v0.9.3.

New features

ENH: update cmd help info by @luweizheng in #1106
ENH: Remove quantization limits for Apple METAL device when running model via llama-cpp-python by @ChengjieLi28 in #1134
ENH: Make GET /v1/models compatible with OpenAI API. by @notsyncing in #1127
ENH: support vllm>=0.3.1 by @qinxuye in #1145

BUG: fix the useless fstring. by @mikeshi80 in #1130
BUG: Fixing the issue of model list loading failure caused by a large number of invalid requests on the model list page. by @wertycn in #1111
BUG: Fix cache status for embedding, rerank and image models on the web UI by @ChengjieLi28 in #1135
BUG: Fix missing information for xinference registrations and xinference list command by @ChengjieLi28 in #1140
BUG: Fix cannot continue to chat after canceling the streaming chat via ctrl+c by @ChengjieLi28 in #1144

Full Changelog: v0.9.2...v0.9.3