`IterativeTrainer`, NEFTune and major bugfixes for `DPOTrainer` and Distributed Training

In this release we introduce two new features, IterativeTrainer from @gaetanlop and NEFTune, together with important bugfixes for distributed training.

IterativeTrainer

Iterative fine-tuning is a training method that enables to perform custom actions (generation and filtering for example) between optimization steps. In TRL we provide an easy-to-use API to fine-tune your models in an iterative way in just a few lines of code.

Read more about it here: https://huggingface.co/docs/trl/iterative_sft_trainer

Introducing the Iterative Trainer by @gaetanlop in #737

NEFTune

NEFTune is a technique to boost the performance of chat models and was introduced by the paper “NEFTune: Noisy Embeddings Improve Instruction Finetuning” from Jain et al. it consists of adding noise to the embedding vectors during training. According to the abstract of the paper:

[SFTTrainer] Adds NEFTune into SFTTrainer by @younesbelkada in #871
[NEFTune] Make use of forward hooks instead by @younesbelkada in #889
Generalize NEFTune for FSDP, DDP, ... by @younesbelkada in #924
[NEFTune] Make use of forward hooks instead by @younesbelkada in #889

Major bugfixes

Major bugfixes have been addressed to tackle many issues with distributed training and gradient checkpointing.

[DPO] fix DPO + GC issues by @younesbelkada in #927
[core / DDP] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs in SFT & DPO by @younesbelkada in #912

DPOTrainer enhancements and fixes

The DPOTrainer now comes with multiple enhancements and bugfixes! Check them out below

[DPO] add SLiC hinge loss to DPOTrainer by @kashif in #866
Fix DPOTrainer + PEFT by @younesbelkada in #941
[DPO] Merge initial peft model if trainer has a peft_config by @kashif in #956
Adds model kwargs to SFT and DPO trainers by @edbeeching in #951
fix: dpo trainer ds config by @mengban in #957
hotfix for dpo trainer by @mnoukhov in #919
Fix dpo_llama2.py by @younesbelkada in #934

What's Changed

Release: v0.7.2 by @younesbelkada in #863
set dev version by @younesbelkada in #864
Remove duplicate key in reward_modeling.py by @vwxyzjn in #890
fix peft_config type by @u2takey in #883
fix: remove useless token by @rtrompier in #896
[reward_modeling] Cleaning example script by @gaetanlop in #882
Fix couple wrong links on lib homepage by @paulbricman in #908
Add whiten ops before compute advatanges by @SingL3 in #887
Fix broken link/markdown by @osanseviero in #903
[Update reward_trainer.py] append PeftSavingCallback if callbacks is not None by @zuoxingdong in #910
deactivate MacOS CI by @lvwerra in #913
fix stackllama2 sft gradient checkpointing by @nrailg in #906
updating PPOTrainer docstring by @lomahony in #897
Bump minimum tyro version by @brentyi in #928
[Feature] Enable Intel XPU support by @abhilash1910 in #839
[SFTTrainer] Make sure to not conflict between transformers and TRL implementation by @younesbelkada in #933
Fix stale bot by @younesbelkada in #935
Optionally logging reference response by @vwxyzjn in #847
[CI] Fix CI with new transformers release by @younesbelkada in #946
Fix unwrapping peft models by @kkteru in #948
Added support for custom EncoderDecoder models by @ribesstefano in #911

New Contributors

@u2takey made their first contribution in #883
@rtrompier made their first contribution in #896
@paulbricman made their first contribution in #908
@SingL3 made their first contribution in #887
@nrailg made their first contribution in #906
@lomahony made their first contribution in #897
@brentyi made their first contribution in #928
@abhilash1910 made their first contribution in #839
@kkteru made their first contribution in #948
@ribesstefano made their first contribution in #911
@mengban made their first contribution in #957

Full Changelog: v0.7.2...v0.7.3

huggingface/trl v0.7.3 v0.7.3:`IterativeTrainer`, NEFTune and major bugfixes for `DPOTrainer` and Distributed Training on GitHub

IterativeTrainer, NEFTune and major bugfixes for DPOTrainer and Distributed Training