What's new in 0.11.0 (2024-05-11)
These are the changes in inference v0.11.0.
Break Changes
v0.11.0 introduced break change when launching model that model_engine
should be specified, refer to Model Engine for more information
New features
- FEAT: Support Mixtral-8x22b-instruct-v0.1 by @qinxuye in #1340
- feat: add phi-3-mini series by @orangeclk in #1379
- FEAT: add Starling model by @boy-hack in #1384
- FEAT: support qwen1.5 110b by @qinxuye in #1388
- FEAT: Support query engine with cmdline by @Ago327 in #1380
- FEAT: Ascend support by @qinxuye in #1408
- FEAT: Audio support verbose_json and timestamp by @codingl2k1 in #1402
- FEAT: [UI] Add engine option when launching LLM by @yiboyasss in #1456
Enhancements
- ENH: add custom image model by @amumu96 in #1312
- ENH: Support more quantization with VLLM by @amumu96 in #1372
- ENH: Update chatglm3 6b model version by @codingl2k1 in #1401
- ENH: make qwen_vl support streaming output by @Minamiyama in #1425
- ENH: Removed the max tokens limitation and boost performance by avoid unnecessary repeated cuda device detection. by @mikeshi80 in #1429
- ENH: Improve benchmark and add long context generate by @frostyplanet in #1423
- ENH: make yi_vl support streaming output by @Minamiyama in #1443
- ENH: Some minor changes by @frostyplanet in #1453
- ENH: make deepseek_vl support streaming output by @Minamiyama in #1444
- ENH: Rename
model_engine
for more clear inference backend by @ChengjieLi28 in #1466 - BLD: Use self-hosted aws machine to build docker image by @ChengjieLi28 in #1405
- CLN: Remove actor client by @ChengjieLi28 in #1436
- CLN: Remove all speculative-related codes by @ChengjieLi28 in #1435
- REF: Query for engine by @Ago327 in #1342
- REF: [UI] Refactor register model by @yiboyasss in #1368
- REF: Add the
model_engine
parameter for launching process by @hainaweiben in #1367
Bug fixes
- BUG: Fix llama3-instruct 70B filename error by @ChengjieLi28 in #1370
- BUG: no role:user msg or content empty got an error. by @liuzhenghua in #1378
- BUG: fix file template of andrewcanis/c4ai-command-r-v01-GGUF by @emulated24 in #1389
- BUG: Fix using extra gpus due to match in
__init__
by @ChengjieLi28 in #1400 - BUG: Fix qwen tool call paramerter empty issue by @codingl2k1 in #1381
- BUG: Fix tool calls return invalid usage by @codingl2k1 in #1420
- BUG: Fix tools ability by @mikeshi80 in #1447
- BUG: Install error on MacOS due to
auto-gptq
by @ChengjieLi28 in #1457 - BUG: fix some issues in query engine interface by @Ago327 in #1442
Tests
- TST: Pin
huggingface-hub
to pass CI since it has some break changes by @ChengjieLi28 in #1427
Documentation
- DOC: update readme & fix Mac CI by @qinxuye in #1385
- DOC: worker address should be specified for
xinference-worker
by @amumu96 in #1397 - DOC: update docker doc in using xinference by @qinxuye in #1417
- DOC: add the missing backslash in shell command by @mikeshi80 in #1451
- DOC: Usage about
model_engine
by @ChengjieLi28 in #1468
Others
New Contributors
- @liuzhenghua made their first contribution in #1378
- @emulated24 made their first contribution in #1389
- @orangeclk made their first contribution in #1379
- @boy-hack made their first contribution in #1384
- @frostyplanet made their first contribution in #1423
Full Changelog: v0.10.3...v0.11.0