github huggingface/trl v0.7.3
v0.7.3:`IterativeTrainer`, NEFTune and major bugfixes for `DPOTrainer` and Distributed Training

latest releases: v0.8.6, v0.8.5, v0.8.4...
10 months ago

IterativeTrainer, NEFTune and major bugfixes for DPOTrainer and Distributed Training

In this release we introduce two new features, IterativeTrainer from @gaetanlop and NEFTune, together with important bugfixes for distributed training.

IterativeTrainer

Iterative fine-tuning is a training method that enables to perform custom actions (generation and filtering for example) between optimization steps. In TRL we provide an easy-to-use API to fine-tune your models in an iterative way in just a few lines of code.

Read more about it here: https://huggingface.co/docs/trl/iterative_sft_trainer

NEFTune

NEFTune is a technique to boost the performance of chat models and was introduced by the paper “NEFTune: Noisy Embeddings Improve Instruction Finetuning” from Jain et al. it consists of adding noise to the embedding vectors during training. According to the abstract of the paper:

Read more about it here

Major bugfixes

Major bugfixes have been addressed to tackle many issues with distributed training and gradient checkpointing.

  • [DPO] fix DPO + GC issues by @younesbelkada in #927
  • [core / DDP] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs in SFT & DPO by @younesbelkada in #912

DPOTrainer enhancements and fixes

The DPOTrainer now comes with multiple enhancements and bugfixes! Check them out below

What's Changed

New Contributors

Full Changelog: v0.7.2...v0.7.3

Don't miss a new trl release

NewReleases is sending notifications on new releases.