github huggingface/trl v0.7.5
v0.7.5: IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

latest releases: v0.8.6, v0.8.5, v0.8.4...
10 months ago

IPO & KTO & cDPO loss, DPOTrainer enhancements, automatic tags for xxxTrainer

Important enhancements for DPOTrainer

This release introduces many new features in TRL for DPOTrainer:

  • IPO-loss for a better generalization of DPO algorithm
  • KTO & cDPO loss
  • You can also pass pre-computed logits to DPOTrainer

Automatic xxxTrainer tagging on the Hub

Now, trainers from TRL pushes automatically tags trl-sft, trl-dpo, trl-ddpo when pushing models on the Hub

unsloth 🤝 TRL

We encourage users to try out unsloth library for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer

What's Changed

New Contributors

Full Changelog: v0.7.4...v0.7.5

Don't miss a new trl release

NewReleases is sending notifications on new releases.