New features
- Support block expansion in LLaMA Pro, see
tests/llama_pro.py
for usage - Add
use_rslora
option for the LoRA method
New models
- Base models
- Qwen1.5 (0.5B/1.8B/4B/7B/14B/72B)
- DeepSeekMath-7B-Base
- DeepSeekCoder-7B-Base-v1.5
- Orion-14B-Base
- Instruct/Chat models
- Qwen1.5-Chat (0.5B/1.8B/4B/7B/14B/72B)
- MiniCPM-2B-SFT/DPO
- DeepSeekMath-7B-Instruct
- DeepSeekCoder-7B-Instruct-v1.5
- Orion-14B-Chat
- Orion-14B-Long-Chat
- Orion-14B-RAG-Chat
- Orion-14B-Plugin-Chat
New datasets
- Supervised fine-tuning datasets
- SlimOrca (en)
- Dolly (de)
- Dolphin (de)
- Airoboros (de)
- Preference datasets
- Orca DPO (de)
Bug fix
- Fix
torch_dtype
check in export model by @fenglui in #2262 - Add Russian locale to LLaMA Board by @seoeaa in #2264
- Remove manually set
use_cache
in export model by @yhyu13 in #2266 - Fix DeepSpeed Zero3 training with MoE models by @A-Cepheus in #2283
- Add a patch for full training of the Mixtral model using DeepSpeed Zero3 by @ftgreat in #2319
- Fix bug in data pre-processing by @lxsyz in #2411
- Add German sft and dpo datasets by @johannhartmann in #2423
- Add version checking in
test_toolcall.py
by @mini-tiger in #2435 - Enable parsing of SlimOrca dataset by @mnmueller in #2462
- Add tags for models when pushing to hf hub by @younesbelkada in #2474
- Fix #2189 #2268 #2282 #2320 #2338 #2376 #2388 #2394 #2397 #2404 #2412 #2420 #2421 #2436 #2438 #2471 #2481