github huggingface/trl v0.8.2
v0.8.2: ORPO & CPO Trainer / Vision LLMs support for `SFTTrainer`, KTO fixes

latest releases: v0.8.6, v0.8.5, v0.8.4...
2 months ago

ORPO Trainer & Vision LLMs support for SFTTrainer, KTO fixes

This release includes two new trainers: ORPO from KAIST and CPO
The release also includes Vision LLM such as Llava support for SFTTrainer, please see: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

ORPO Trainer

CPO Trainer

VLLMs support for SFTTrainer

You can now use SFTTrainer to fine-tune VLLMs such as Llava !
See: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

KTO Fixes

Many fixes were introduced for the KTOTrainer:

  • Update KTO example to use better model and ChatML support by @lewtun in #1485
  • [KTO] Use batching to speed up data processing by @lewtun in #1470
  • Update KTO example with good dataset & chat format by @lewtun in #1481
  • [KTO] fix interleaving, reporting, hanging bugs by @kawine in #1499
  • [KTO] fix metric logging by @claralp in #1514

10x PPO !

Other fixes

New Contributors

Full Changelog: v0.8.1...v0.8.2

Don't miss a new trl release

NewReleases is sending notifications on new releases.