We released our paper on arXiv! Thanks to all co-authors and AK's recommendation
New features
- Support GaLore algorithm, allowing full-parameter learning of a 7B model using less than 24GB VRAM
- Support FSDP+QLoRA that allows QLoRA fine-tuning of a 70B model on 2x24GB GPUs
- Support LoRA+ algorithm for better LoRA fine-tuning by @qibaoyuan in #2830
- LLaMA Factory 🤝 vLLM, enjoy 270% inference speed with
--infer_backend vllm
- Add Colab notebook for easily getting started
- Support pushing fine-tuned models to Hugging Face Hub in web UI
- Support
apply_chat_template
by adding a chat template to the tokenizer after fine-tuning - Add dockerize support by @S3Studio in #2743 #2849
New models
- Base models
- OLMo (1B/7B)
- StarCoder2 (3B/7B/15B)
- Yi-9B
- Instruct/Chat models
- OLMo-7B-Instruct
New datasets
- Supervised fine-tuning datasets
- Cosmopedia (en)
- Preference datasets
- Orca DPO (en)
Bug fix
- Fix flash_attn in web UI by @cx2333-gt in #2730
- Fix deepspeed runtime error in PPO by @stephen-nju in #2746
- Fix readme ddp instruction by @khazic in #2903
- Fix environment variable in datasets by @SirlyDreamer in #2905
- Fix readme information by @0xez in #2919
- Fix generation config validation by @marko1616 in #2945
- Fix requirements by @rkinas in #2963
- Fix bitsandbytes windows version by @Tsumugii24 in #2967
- Fix #2346 #2642 #2649 #2732 #2735 #2756 #2766 #2775 #2777 #2782 #2798 #2802 #2803 #2817 #2895 #2928 #2936 #2941