What's new in 0.9.2 (2024-03-08)
These are the changes in inference v0.9.2.
New features
- FEAT: Add a command / SDK interface to query which models are able to… by @hainaweiben in #1076
- FEAT: add a docker-compose-distributed example with multiple workers by @bufferoverflow in #1064
- FEAT: Support download and merge multiple parts of gguf files by @notsyncing in #1075
- FEAT: Supports LoRA for LLM and image models by @ChengjieLi28 in #1080
Enhancements
- ENH: Supports
n_gpu_layers
parameter forllama-cpp-python
by @ChengjieLi28 in #1070 - ENH: Add a dropdown to the web UI to support adjusting GPU offload layers for llama.cpp loader by @notsyncing in #1073
- ENH: [UI] Show
replica
on running model page by @ChengjieLi28 in #1093 - ENH: Add "[DONE]" to the end of stream generation for better openai SDK compatibility by @ZhangTianrong in #1062
- ENH: [UI] Support setting
CPU
when selecting n_gpu by @ChengjieLi28 in #1096
Documentation
- DOC: Extra parameters for launching models by @aresnow1 in #1077
- DOC: contribution doc by @Ago327 in #1092
- DOC: doc for lora by @ChengjieLi28 in #1103
Others
- Update llm_family.json to correct the context length of glaive coder by @mikeshi80 in #1083
New Contributors
- @mikeshi80 made their first contribution in #1083
- @bufferoverflow made their first contribution in #1064
- @Ago327 made their first contribution in #1092
Full Changelog: v0.9.1...v0.9.2