DDPO for diffusion models

We are excited to welcome the first RLHF + diffusion models algorithm to refine the generations from diffusion models.
Read more about it directly in the docs.

Before	After DDPO finetuning

Denoising Diffusion Policy Optimization by @metric-space in #508

Bug fixes and other enhancements

The release also comes with multiple bug fixes reported and/or led by the community, check out the commit history below

What's Changed

Release: v0.5.0 by @younesbelkada in #607
Set dev version by @younesbelkada in #608
[Modeling] Add token support for hf_hub_download by @younesbelkada in #604
Add docs explaining logged metrics by @vwxyzjn in #616
[DPO] stack-llama-2 training scripts by @kashif in #611
Use log_with argument in SFT example by @hitorilabs in #620
Allow already tokenized sequences for response_template in DataCollatorForCompletionOnlyLM by @ivsanro1 in #622
Improve docs by @lvwerra in #612
Move repo by @lvwerra in #628
Add score scaling/normalization/clipping by @zfang in #560
Disable dropout in DPO Training by @NouamaneTazi in #639
Add checks on backward batch size by @vwxyzjn in #651
Resolve various typos throughout the docs by @tomaarsen in #654
Update README.md by @Santosh-Gupta in #657
Allow for ref_model=None in DPOTrainer by @vincentmin in #640
Add more args to SFT example by @photomz in #642
Handle potentially long sequences with DataCollatorForCompletionOnlyLM by @tannonk in #644
[sft_llama2] Add check of arguments by @younesbelkada in #660
Fix DPO blogpost thumbnail by @lvwerra in #673
propagating eval_batch_size to TrainingArguments by @rahuljha in #675
[CI] Fix unmutable TrainingArguments issue by @younesbelkada in #676
Update sft_llama2.py by @msaad02 in #678
fix PeftConfig loading from a remote repo. by @w32zhong in #649
Simplify immutable TrainingArgs fix using dataclasses.replace by @tomaarsen in #682

New Contributors

@hitorilabs made their first contribution in #620
@ivsanro1 made their first contribution in #622
@zfang made their first contribution in #560
@NouamaneTazi made their first contribution in #639
@Santosh-Gupta made their first contribution in #657
@vincentmin made their first contribution in #640
@photomz made their first contribution in #642
@tannonk made their first contribution in #644
@rahuljha made their first contribution in #675
@msaad02 made their first contribution in #678
@w32zhong made their first contribution in #649

Full Changelog: v0.5.0...v0.6.0

huggingface/trl v0.6.0 on GitHub

DDPO for diffusion models

Bug fixes and other enhancements

What's Changed

New Contributors

huggingface/trl v0.6.0
on GitHub