0.7.2: Flash Attention documentation and Minor bugfixes

In this release we provide minor bugfixes and smoother user experience for all public classes. We also added some clarification on the documentation on how to use Flash Attention with SFTTrainer

How to use Flash Attention with `SFTTrainer`:

Update sft_trainer.mdx to highlight Flash Attention features by @younesbelkada in #807

What's Changed

Release: v0.7.1 by @younesbelkada in #709
set dev version by @younesbelkada in #710
fix device issue by @backpropper in #681
Update docs on gms8k by @vwxyzjn in #711
[Docs] Fix sft mistakes by @younesbelkada in #717
Fix: RuntimeError: 'weight' must be 2-D issue by @jp1924 in #687
Add pyproject.toml by @mnoukhov in #690
[core] Bump peft to 0.4.0 by @younesbelkada in #720
Refactor RewardTrainer hyperparameters into dedicated dataclass by @lewtun in #726
Fix DeepSpeed ZeRO-3 in PPOTrainer by @lewtun in #730
[SFTTrainer] Check correctly for condition by @younesbelkada in #668
Add epsilon to score normalization by @zfang in #727
Enable gradient checkpointing to be disabled for reward modelling by @lewtun in #725
[DPO] fixed metrics typo by @kashif in #743
Seq2Seq model support for DPO by @gaetanlop in #586
[DPO] fix ref_model by @i4never in #745
[core] Fix import of randn_tensor by @younesbelkada in #751
Add benchmark CI by @vwxyzjn in #752
update to prepare_model_for_kbit_training by @mnoukhov in #728
benchmark CI fix by @vwxyzjn in #755
EOS token processing for multi-turn DPO by @natolambert in #741
Extend DeepSpeed integration to ZeRO-{1,2,3} by @lewtun in #758
Imrpove benchmark ci by @vwxyzjn in #760
[PPOTrainer] - add comment of zero masking (from second query token) by @zuoxingdong in #763
Refactor and benchmark by @vwxyzjn in #662
Benchmark CI (actual) by @vwxyzjn in #754
docs: add initial version of docs for PPOTrainer by @davidberenstein1957 in #665
Support fork in benchmark CI by @vwxyzjn in #764
Update benchmark.yml by @vwxyzjn in #773
Benchmark CI fix by @vwxyzjn in #775
Benchmark CI fix by @vwxyzjn in #776
Update benchmark.yml by @vwxyzjn in #777
Update benchmark.yml by @vwxyzjn in #778
Update benchmark.yml by @vwxyzjn in #779
Update benchmark.yml by @vwxyzjn in #780
Update benchmark.yml by @vwxyzjn in #781
Update benchmark.yml by @vwxyzjn in #782
Ensure RewardConfig is backwards compatible by @lewtun in #748
Temp benchmark ci dir by @vwxyzjn in #765
Changed the default value of the log_with argument by @filippobistaffa in #792
Add default Optim to DPO example by @natolambert in #759
Add margin to RM training by @jvhoffbauer in #719
[DPO] Revert "Add default Optim to DPO example (#759)" by @younesbelkada in #799
Add deepspeed experiment by @vwxyzjn in #795
[Docs] Clarify PEFT docs by @younesbelkada in #797
Fix docs bug on sft_trainer.mdx by @younesbelkada in #808
[PPOTrainer] Fixes ppo trainer generate nit by @younesbelkada in #798
Allow passing the token_ids as instruction_template in DataCollatorForCompletionOnlyLM by @devxpy in #749
init custom eval loop for further DPO evals by @natolambert in #766
Add RMSProp back to DPO by @natolambert in #821
[DPO] add option for compute_metrics in DPOTrainer by @kashif in #822
Small fixes to the PPO trainer doc and script. by @namin in #811
Unify sentiment documentation by @vwxyzjn in #803
Fix DeepSpeed ZeRO-{1,2} for DPOTrainer by @lewtun in #825
Set trust remote code to false by default by @lewtun in #833
[MINOR:TYPOS] Update README.md by @cakiki in #829
Clarify docstrings, help messages, assert messages in merge_peft_adapter.py by @larekrow in #838
add DDPO to index by @lvwerra in #826
Raise error in create_reference_model() when ZeRO-3 is enabled by @lewtun in #840
Use uniform config by @vwxyzjn in #817
Give lewtun power by @lvwerra in #856
Standardise example scripts by @lewtun in #842
Fix version check in import_utils.py by @adampauls in #853
dont use get_peft_model if model is already peft by @abhishekkrthakur in #857
[core] Fix import issues by @younesbelkada in #859
Support both old and new diffusers import path by @osanseviero in #843

New Contributors

@backpropper made their first contribution in #681
@jp1924 made their first contribution in #687
@i4never made their first contribution in #745
@zuoxingdong made their first contribution in #763
@davidberenstein1957 made their first contribution in #665
@filippobistaffa made their first contribution in #792
@devxpy made their first contribution in #749
@namin made their first contribution in #811
@cakiki made their first contribution in #829
@larekrow made their first contribution in #838
@adampauls made their first contribution in #853
@abhishekkrthakur made their first contribution in #857
@osanseviero made their first contribution in #843

Full Changelog: v0.7.1...v0.7.2

huggingface/trl v0.7.2 on GitHub

0.7.2: Flash Attention documentation and Minor bugfixes

How to use Flash Attention with SFTTrainer:

What's Changed

New Contributors

huggingface/trl v0.7.2
on GitHub

How to use Flash Attention with `SFTTrainer`: