New features
- Support loading pre-trained models from ModelScope Hub by @tastelikefeet in #1700
- Support launching a reward model server in demo API via specifying
--stage=rm
inapi_demo.py
- Support using a reward model server in PPO training via specifying
--reward_model_type api
- Support adjusting the shard size of exported models via the
export_size
argument
New models
- Base models
- DeepseekLLM-Base (7B/67B)
- Qwen (1.8B/72B)
- Instruct/Chat models
- DeepseekLLM-Chat (7B/67B)
- Qwen-Chat (1.8B/72B)
- Yi-34B-Chat
New datasets
- Supervised fine-tuning datasets
- Preference datasets