github huggingface/trl v0.7.8
v0.7.8: Unsloth tag, DPO fixes, PEFT support for DDPO

latest releases: v0.8.6, v0.8.5, v0.8.4...
10 months ago

v0.7.8: Unsloth tag, DPO fixes, PEFT support for DDPO

Unsloth tag for xxxTrainer

If users use Unsloth library, the unsloth tag gets automatically pushed on the Hub.

DPO fixes

Some important fixes for DPO has been introduced to address: https://twitter.com/jon_durbin/status/1743575483365699809 and to make DPO faster

  • Allow separate devices for target/ref models. by @jondurbin in #1190
  • Allow swapping PEFT adapters for target/ref model. by @jondurbin in #1193
  • Change device access order for speedup of calculating metrics in DPOTrainer by @brcps12 in #1154

DDPO + PEFT

Now DDPO supports PEFT

Other fixes

New Contributors

Full Changelog: v0.7.7...v0.7.8

Don't miss a new trl release

NewReleases is sending notifications on new releases.