github huggingface/trl v0.7.11
v0.7.11: IPO & DPO fixes, faster data processing for multi-GPU, Automatic tagging for all models

latest releases: v0.8.6, v0.8.5, v0.8.4...
7 months ago

DPO important fixes

We fixed issues with respect to IPO loss, leading to consistent results according to newest experiements:

  • [DPO] average_log_prob when loss is IPO by @kashif in #1265

We also fixed important bugs with respect to DPO / PEFT and Flash Attention

Data processing is now faster for multi-GPU envs

Other DPO bugfixes:

  • [PEFT + DPO] Raise value error if one passes a ref_model and a peft_config by @younesbelkada in #1289
  • Fix wrong variable name in DPOTrainer documentation example by @ouhenio in #1280
  • fix padding in dpo trainer by @pacman100 in #1284
  • Fix AttributeError in dpo_trainer for reference_free case in dpo_loss function by @maliozer in #1313
  • [DPOTrainer] Add multiprocessing for the eval_dataset map by @esceptico in #1307

Faster data processing and other enhancements:

Automatic tagging for all models

Models now gets tagged correctly even if users do not call trainer.push_to_hub()

What's Changed

New Contributors

Full Changelog: v0.7.10...v0.7.11

Don't miss a new trl release

NewReleases is sending notifications on new releases.