What's new in 0.9.2 (2024-03-08)

These are the changes in inference v0.9.2.

New features

FEAT: Add a command / SDK interface to query which models are able to… by @hainaweiben in #1076
FEAT: add a docker-compose-distributed example with multiple workers by @bufferoverflow in #1064
FEAT: Support download and merge multiple parts of gguf files by @notsyncing in #1075
FEAT: Supports LoRA for LLM and image models by @ChengjieLi28 in #1080

ENH: Supports n_gpu_layers parameter for llama-cpp-python by @ChengjieLi28 in #1070
ENH: Add a dropdown to the web UI to support adjusting GPU offload layers for llama.cpp loader by @notsyncing in #1073
ENH: [UI] Show replica on running model page by @ChengjieLi28 in #1093
ENH: Add "[DONE]" to the end of stream generation for better openai SDK compatibility by @ZhangTianrong in #1062
ENH: [UI] Support setting CPU when selecting n_gpu by @ChengjieLi28 in #1096

Update llm_family.json to correct the context length of glaive coder by @mikeshi80 in #1083

Full Changelog: v0.9.1...v0.9.2