New features
- Support loading pre-trained models from ModelScope Hub by @tastelikefeet in #1700
- Support launching a reward model server in demo API via specifying
--stage=rminapi_demo.py - Support using a reward model server in PPO training via specifying
--reward_model_type api - Support adjusting the shard size of exported models via the
export_sizeargument
New models
- Base models
- DeepseekLLM-Base (7B/67B)
- Qwen (1.8B/72B)
- Instruct/Chat models
- DeepseekLLM-Chat (7B/67B)
- Qwen-Chat (1.8B/72B)
- Yi-34B-Chat
New datasets
- Supervised fine-tuning datasets
- Preference datasets