DPO important fixes
We fixed issues with respect to IPO loss, leading to consistent results according to newest experiements:
We also fixed important bugs with respect to DPO / PEFT and Flash Attention
- [
DPOTrainer
] Fix DPO trainer + mistral + FA2 by @younesbelkada in #1290
Data processing is now faster for multi-GPU envs
- [
DPOTrainer
] Load data only on main process + fix dpo example test by @younesbelkada in #1291 - Add multiprocessing in the DPO trainer. by @imraviagrawal in #1286
Other DPO bugfixes:
- [
PEFT
+DPO
] Raise value error if one passes a ref_model and a peft_config by @younesbelkada in #1289 - Fix wrong variable name in DPOTrainer documentation example by @ouhenio in #1280
- fix padding in dpo trainer by @pacman100 in #1284
- Fix AttributeError in dpo_trainer for reference_free case in dpo_loss function by @maliozer in #1313
- [DPOTrainer] Add multiprocessing for the eval_dataset map by @esceptico in #1307
Faster data processing and other enhancements:
- Only load data on main process by @JohnGiorgi in #1255
- Remove tyro by @vwxyzjn in #1176
Automatic tagging for all models
Models now gets tagged correctly even if users do not call trainer.push_to_hub()
- [
core
/xxxTrainer
] Automatic tagging by @younesbelkada in #1329
What's Changed
- set dev version by @younesbelkada in #1254
- Update Model Generation config to reflect new special tokens by @philschmid in #1256
- Fix a typo in variable name by @otlaitil in #1269
- FIx SFTTrainer bugs on TRL main by @younesbelkada in #1276
- Fix SFT tuner in CI by @vwxyzjn in #1278
- Fix sft ci by @vwxyzjn in #1279
- Fix DPO slow tests by @younesbelkada in #1292
- Fix sft trainer when args is None by @younesbelkada in #1295
- Fix
DPOTrainer
docstrings by @alvarobartt in #1298 - Types: Fix PEP 484 implicit-optional compliance by @akx in #1297
- Update sft_trainer.mdx to add note on launching DDP training by @johnowhitaker in #1308
- Codemod Unittest assertions to bare asserts by @akx in #1301
- ENH: Run CI only if relevant files are modified by @younesbelkada in #1309
- Fix typos in docs for Multi Adapter RL (MARL). by @elhusseiniali in #1312
- Fix doc snippet PPOTrainer argument train_dataset -> dataset by @j-cb in #1321
- Best practice recommendation update for dpo_trainer.mdx by @R-seny in #1325
- pre-commit: replace linters + formatters with Ruff; fix some issues by @akx in #1300
- Update README.md to clarify model requirement by @markstur in #1315
- [
core
/DDPO
] Fix diffusers import issue by @younesbelkada in #1314 - [
CI
] Add tests on transformers peft main on push main by @younesbelkada in #1328 - Release: v0.7.11 by @younesbelkada in #1331
New Contributors
- @otlaitil made their first contribution in #1269
- @JohnGiorgi made their first contribution in #1255
- @ouhenio made their first contribution in #1280
- @imraviagrawal made their first contribution in #1286
- @akx made their first contribution in #1297
- @esceptico made their first contribution in #1307
- @johnowhitaker made their first contribution in #1308
- @elhusseiniali made their first contribution in #1312
- @maliozer made their first contribution in #1313
- @j-cb made their first contribution in #1321
- @R-seny made their first contribution in #1325
- @markstur made their first contribution in #1315
Full Changelog: v0.7.10...v0.7.11