IterativeTrainer
, NEFTune and major bugfixes for DPOTrainer
and Distributed Training
In this release we introduce two new features, IterativeTrainer
from @gaetanlop and NEFTune, together with important bugfixes for distributed training.
IterativeTrainer
Iterative fine-tuning is a training method that enables to perform custom actions (generation and filtering for example) between optimization steps. In TRL we provide an easy-to-use API to fine-tune your models in an iterative way in just a few lines of code.
Read more about it here: https://huggingface.co/docs/trl/iterative_sft_trainer
- Introducing the Iterative Trainer by @gaetanlop in #737
NEFTune
NEFTune is a technique to boost the performance of chat models and was introduced by the paper “NEFTune: Noisy Embeddings Improve Instruction Finetuning” from Jain et al. it consists of adding noise to the embedding vectors during training. According to the abstract of the paper:
- [
SFTTrainer
] Adds NEFTune intoSFTTrainer
by @younesbelkada in #871 - [
NEFTune
] Make use of forward hooks instead by @younesbelkada in #889 - Generalize NEFTune for FSDP, DDP, ... by @younesbelkada in #924
- [
NEFTune
] Make use of forward hooks instead by @younesbelkada in #889
Read more about it here
Major bugfixes
Major bugfixes have been addressed to tackle many issues with distributed training and gradient checkpointing.
- [
DPO
] fix DPO + GC issues by @younesbelkada in #927 - [
core
/DDP
] Fix RM trainer + DDP + quantization + propagategradient_checkpointing_kwargs
in SFT & DPO by @younesbelkada in #912
DPOTrainer enhancements and fixes
The DPOTrainer now comes with multiple enhancements and bugfixes! Check them out below
- [DPO] add SLiC hinge loss to DPOTrainer by @kashif in #866
- Fix DPOTrainer + PEFT by @younesbelkada in #941
- [DPO] Merge initial peft model if trainer has a peft_config by @kashif in #956
- Adds model kwargs to SFT and DPO trainers by @edbeeching in #951
- fix: dpo trainer ds config by @mengban in #957
- hotfix for dpo trainer by @mnoukhov in #919
- Fix dpo_llama2.py by @younesbelkada in #934
What's Changed
- Release: v0.7.2 by @younesbelkada in #863
- set dev version by @younesbelkada in #864
- Remove duplicate key in
reward_modeling.py
by @vwxyzjn in #890 - fix peft_config type by @u2takey in #883
- fix: remove useless token by @rtrompier in #896
- [reward_modeling] Cleaning example script by @gaetanlop in #882
- Fix couple wrong links on lib homepage by @paulbricman in #908
- Add whiten ops before compute advatanges by @SingL3 in #887
- Fix broken link/markdown by @osanseviero in #903
- [Update reward_trainer.py] append PeftSavingCallback if callbacks is not None by @zuoxingdong in #910
- deactivate MacOS CI by @lvwerra in #913
- fix stackllama2 sft gradient checkpointing by @nrailg in #906
- updating PPOTrainer docstring by @lomahony in #897
- Bump minimum
tyro
version by @brentyi in #928 - [Feature] Enable Intel XPU support by @abhilash1910 in #839
- [
SFTTrainer
] Make sure to not conflict betweentransformers
and TRL implementation by @younesbelkada in #933 - Fix stale bot by @younesbelkada in #935
- Optionally logging reference response by @vwxyzjn in #847
- [
CI
] Fix CI with new transformers release by @younesbelkada in #946 - Fix unwrapping peft models by @kkteru in #948
- Added support for custom EncoderDecoder models by @ribesstefano in #911
New Contributors
- @u2takey made their first contribution in #883
- @rtrompier made their first contribution in #896
- @paulbricman made their first contribution in #908
- @SingL3 made their first contribution in #887
- @nrailg made their first contribution in #906
- @lomahony made their first contribution in #897
- @brentyi made their first contribution in #928
- @abhilash1910 made their first contribution in #839
- @kkteru made their first contribution in #948
- @ribesstefano made their first contribution in #911
- @mengban made their first contribution in #957
Full Changelog: v0.7.2...v0.7.3