IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

Important enhancements for `DPOTrainer`

This release introduces many new features in TRL for DPOTrainer:

IPO-loss for a better generalization of DPO algorithm
KTO & cDPO loss
You can also pass pre-computed logits to DPOTrainer

[DPO] Refactor eval logging of dpo trainer by @mnoukhov in #954
Fixes reward and text gathering in distributed training by @edbeeching in #850
remove spurious optimize_cuda_cache deprecation warning on init by @ChanderG in #1045
Revert "[DPO] Refactor eval logging of dpo trainer (#954)" by @lvwerra in #1047
Fix DPOTrainer + PEFT 2 by @rdk31 in #1049
[DPO] IPO Training loss by @kashif in #1022
[DPO] cDPO loss by @kashif in #1035
[DPO] use ref model logprobs if it exists in the data by @kashif in #885
[DP0] save eval_dataset for subsequent calls by @kashif in #1125
[DPO] rename kto loss by @kashif in #1127
[DPO] add KTO loss by @kashif in #1075

Automatic `xxxTrainer` tagging on the Hub

Now, trainers from TRL pushes automatically tags trl-sft, trl-dpo, trl-ddpo when pushing models on the Hub

[xxxTrainer] Add tags to all trainers in TRL by @younesbelkada in #1120

unsloth 🤝 TRL

We encourage users to try out unsloth library for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer

[Docs] Add unsloth optimizations in TRL's documentation by @younesbelkada in #1119

What's Changed

set dev version by @younesbelkada in #970
[Tests] Add non optional packages tests by @younesbelkada in #974
[DOCS] Fix outdated references to examples/ by @alvarobartt in #977
Update README.md by @GeekDream-x in #994
[DataCollatorForCompletionOnlyLM] Warn on identical eos_token_id and pad_token_id by @MustSave in #988
[DataCollatorForCompletionOnlyLM] Add more clarification / guidance in the case tokenizer.pad_token_id == tokenizer.eos_token_id by @younesbelkada in #992
make distributed true for multiple process by @allanj in #997
Fixed wrong trigger for warning by @zabealbe in #971
Update how_to_train.md by @halfrot in #1003
Adds requires_grad to input for non-quantized peft models by @younesbelkada in #1006
[Multi-Adapter PPO] Fix and Refactor reward model adapter by @mnoukhov in #982
Remove duplicate data loading in rl_training.py by @viethoangtranduong in #1020
[Document] Minor fixes of sft_trainer document by @mutichung in #1029
Update utils.py by @ZihanWang314 in #1012
spelling is hard by @grahamannett in #1043
Fixing accelerator version function call. by @ParthaEth in #1056
[SFT Trainer] precompute packed iterable into a dataset by @lvwerra in #979
Update doc CI by @lewtun in #1060
Improve PreTrainedModelWrapper._get_current_device by @billvsme in #1048
Update doc for the computer_metrics argument of SFTTrainer by @albertauyeung in #1062
[core] Fix failing tests on main by @younesbelkada in #1065
[SFTTrainer] Fix Trainer when args is None by @younesbelkada in #1064
enable multiple eval datasets by @peter-sk in #1052
Add missing loss_type in ValueError message by @alvarobartt in #1067
Add args to SFT example by @lewtun in #1079
add local folder support as input for rl_training. by @sywangyi in #1078
Make CI happy by @younesbelkada in #1080
Removing tyro in sft_llama2.py by @vwxyzjn in #1081
Log arg consistency by @tcapelle in #1084
Updated documentation for docs/source/reward_trainer.mdx to import th… by @cm2435 in #1092
[Feature] Add Ascend NPU accelerator support by @statelesshz in #1096
peft_module_casting_to_bf16 util method, append_concat_token flag, remove callback PeftSavingCallback by @pacman100 in #1110
Make prepending of bos token configurable. by @pacman100 in #1114
fix gradient checkpointing when using PEFT by @pacman100 in #1118
Update description in setup.py by @alvarobartt in #1101

New Contributors

@alvarobartt made their first contribution in #977
@GeekDream-x made their first contribution in #994
@MustSave made their first contribution in #988
@allanj made their first contribution in #997
@zabealbe made their first contribution in #971
@viethoangtranduong made their first contribution in #1020
@mutichung made their first contribution in #1029
@ZihanWang314 made their first contribution in #1012
@grahamannett made their first contribution in #1043
@ChanderG made their first contribution in #1045
@rdk31 made their first contribution in #1049
@ParthaEth made their first contribution in #1056
@billvsme made their first contribution in #1048
@albertauyeung made their first contribution in #1062
@peter-sk made their first contribution in #1052
@sywangyi made their first contribution in #1078
@tcapelle made their first contribution in #1084
@cm2435 made their first contribution in #1092
@statelesshz made their first contribution in #1096
@pacman100 made their first contribution in #1110

Full Changelog: v0.7.4...v0.7.5

huggingface/trl v0.7.5 v0.7.5: IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer` on GitHub

IPO & KTO & cDPO loss, DPOTrainer enhancements, automatic tags for xxxTrainer

Important enhancements for DPOTrainer

Automatic xxxTrainer tagging on the Hub