- Released Vicuna v1.5 based on Llama 2 with 4K and 16K context lengths. Download weights
- Released Chatbot Arena Conversations, a dataset containing 33k conversations with human preferences. Download it here.
- Serving
- Add a multi-model worker that can host multiple models on a single GPU and share base weights for PEFT models. #1866 #1905
- AWQ 4-bit quantization support. #2103
- Support model models (Llama 2, Claude 2, ChatGLM 2, StarChat, Baichuan-13B, InternLM, airoboros, PEFT adapters).
- Better support for AMD GPUs, Intel XPUs. #1954 #2052
- Training