huggingface/trl v0.7.11 on GitHub

DPO important fixes

We fixed issues with respect to IPO loss, leading to consistent results according to newest experiements:

We also fixed important bugs with respect to DPO / PEFT and Flash Attention

Data processing is now faster for multi-GPU envs

[DPOTrainer] Load data only on main process + fix dpo example test by @younesbelkada in #1291
Add multiprocessing in the DPO trainer. by @imraviagrawal in #1286

Other DPO bugfixes:

[PEFT + DPO] Raise value error if one passes a ref_model and a peft_config by @younesbelkada in #1289
Fix wrong variable name in DPOTrainer documentation example by @ouhenio in #1280
fix padding in dpo trainer by @pacman100 in #1284
Fix AttributeError in dpo_trainer for reference_free case in dpo_loss function by @maliozer in #1313
[DPOTrainer] Add multiprocessing for the eval_dataset map by @esceptico in #1307

Models now gets tagged correctly even if users do not call trainer.push_to_hub()

set dev version by @younesbelkada in #1254
Update Model Generation config to reflect new special tokens by @philschmid in #1256
Fix a typo in variable name by @otlaitil in #1269
FIx SFTTrainer bugs on TRL main by @younesbelkada in #1276
Fix SFT tuner in CI by @vwxyzjn in #1278
Fix sft ci by @vwxyzjn in #1279
Fix DPO slow tests by @younesbelkada in #1292
Fix sft trainer when args is None by @younesbelkada in #1295
Fix DPOTrainer docstrings by @alvarobartt in #1298
Types: Fix PEP 484 implicit-optional compliance by @akx in #1297
Update sft_trainer.mdx to add note on launching DDP training by @johnowhitaker in #1308
Codemod Unittest assertions to bare asserts by @akx in #1301
ENH: Run CI only if relevant files are modified by @younesbelkada in #1309
Fix typos in docs for Multi Adapter RL (MARL). by @elhusseiniali in #1312
Fix doc snippet PPOTrainer argument train_dataset -> dataset by @j-cb in #1321
Best practice recommendation update for dpo_trainer.mdx by @R-seny in #1325
pre-commit: replace linters + formatters with Ruff; fix some issues by @akx in #1300
Update README.md to clarify model requirement by @markstur in #1315
[core / DDPO] Fix diffusers import issue by @younesbelkada in #1314
[CI] Add tests on transformers peft main on push main by @younesbelkada in #1328
Release: v0.7.11 by @younesbelkada in #1331

Full Changelog: v0.7.10...v0.7.11