What's new in 0.5.0 (2023-09-22)
These are the changes in inference v0.5.0.
New features
- FEAT: incorporate vLLM by @UranusSeven in #445
- FEAT: add register model page for dashboard by @Bojun-Feng in #420
- FEAT: internlm 20b by @UranusSeven in #486
- FEAT: support glaive coder by @UranusSeven in #490
- FEAT: Support download models from modelscope by @aresnow1 in #475
Enhancements
- ENH: shorten OpenBuddy's desc by @UranusSeven in #471
- ENH: enable vLLM on Linux with cuda by @UranusSeven in #472
- ENH: vLLM engine supports more models by @UranusSeven in #477
- ENH: remove subpool on failure by @UranusSeven in #478
- ENH: support trust_remote_code when launching a model by @UranusSeven in #479
- ENH: vLLM auto tensor parallel by @UranusSeven in #480
Bug fixes
- BUG: llama-cpp version dismatch by @Bojun-Feng in #473
- BUG: incorrect endpoint on host 0.0.0.0 by @UranusSeven in #474
- BUG: prompt style not set as expected on web UI by @UranusSeven in #489
Tests
Documentation
Full Changelog: v0.4.4...v0.5.0