What's new in 0.11.0 (2024-05-11)

These are the changes in inference v0.11.0.

Break Changes

v0.11.0 introduced break change when launching model that model_engine should be specified, refer to Model Engine for more information

ENH: add custom image model by @amumu96 in #1312
ENH: Support more quantization with VLLM by @amumu96 in #1372
ENH: Update chatglm3 6b model version by @codingl2k1 in #1401
ENH: make qwen_vl support streaming output by @Minamiyama in #1425
ENH: Removed the max tokens limitation and boost performance by avoid unnecessary repeated cuda device detection. by @mikeshi80 in #1429
ENH: Improve benchmark and add long context generate by @frostyplanet in #1423
ENH: make yi_vl support streaming output by @Minamiyama in #1443
ENH: Some minor changes by @frostyplanet in #1453
ENH: make deepseek_vl support streaming output by @Minamiyama in #1444
ENH: Rename model_engine for more clear inference backend by @ChengjieLi28 in #1466
BLD: Use self-hosted aws machine to build docker image by @ChengjieLi28 in #1405
CLN: Remove actor client by @ChengjieLi28 in #1436
CLN: Remove all speculative-related codes by @ChengjieLi28 in #1435
REF: Query for engine by @Ago327 in #1342
REF: [UI] Refactor register model by @yiboyasss in #1368
REF: Add the model_engine parameter for launching process by @hainaweiben in #1367

BUG: Fix llama3-instruct 70B filename error by @ChengjieLi28 in #1370
BUG: no role:user msg or content empty got an error. by @liuzhenghua in #1378
BUG: fix file template of andrewcanis/c4ai-command-r-v01-GGUF by @emulated24 in #1389
BUG: Fix using extra gpus due to match in __init__ by @ChengjieLi28 in #1400
BUG: Fix qwen tool call paramerter empty issue by @codingl2k1 in #1381
BUG: Fix tool calls return invalid usage by @codingl2k1 in #1420
BUG: Fix tools ability by @mikeshi80 in #1447
BUG: Install error on MacOS due to auto-gptq by @ChengjieLi28 in #1457
BUG: fix some issues in query engine interface by @Ago327 in #1442

TST: Pin huggingface-hub to pass CI since it has some break changes by @ChengjieLi28 in #1427

DOC: update readme & fix Mac CI by @qinxuye in #1385
DOC: worker address should be specified for xinference-worker by @amumu96 in #1397
DOC: update docker doc in using xinference by @qinxuye in #1417
DOC: add the missing backslash in shell command by @mikeshi80 in #1451
DOC: Usage about model_engine by @ChengjieLi28 in #1468

Full Changelog: v0.10.3...v0.11.0