IPO & KTO & cDPO loss, DPOTrainer
enhancements, automatic tags for xxxTrainer
Important enhancements for DPOTrainer
This release introduces many new features in TRL for DPOTrainer
:
- IPO-loss for a better generalization of DPO algorithm
- KTO & cDPO loss
- You can also pass pre-computed logits to
DPOTrainer
- [DPO] Refactor eval logging of dpo trainer by @mnoukhov in #954
- Fixes reward and text gathering in distributed training by @edbeeching in #850
- remove spurious optimize_cuda_cache deprecation warning on init by @ChanderG in #1045
- Revert "[DPO] Refactor eval logging of dpo trainer (#954)" by @lvwerra in #1047
- Fix DPOTrainer + PEFT 2 by @rdk31 in #1049
- [DPO] IPO Training loss by @kashif in #1022
- [DPO] cDPO loss by @kashif in #1035
- [DPO] use ref model logprobs if it exists in the data by @kashif in #885
- [DP0] save eval_dataset for subsequent calls by @kashif in #1125
- [DPO] rename kto loss by @kashif in #1127
- [DPO] add KTO loss by @kashif in #1075
Automatic xxxTrainer
tagging on the Hub
Now, trainers from TRL pushes automatically tags trl-sft
, trl-dpo
, trl-ddpo
when pushing models on the Hub
- [
xxxTrainer
] Add tags to all trainers in TRL by @younesbelkada in #1120
unsloth 🤝 TRL
We encourage users to try out unsloth library for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer
- [
Docs
] Add unsloth optimizations in TRL's documentation by @younesbelkada in #1119
What's Changed
- set dev version by @younesbelkada in #970
- [
Tests
] Add non optional packages tests by @younesbelkada in #974 - [DOCS] Fix outdated references to
examples/
by @alvarobartt in #977 - Update README.md by @GeekDream-x in #994
- [DataCollatorForCompletionOnlyLM] Warn on identical
eos_token_id
andpad_token_id
by @MustSave in #988 - [
DataCollatorForCompletionOnlyLM
] Add more clarification / guidance in the casetokenizer.pad_token_id == tokenizer.eos_token_id
by @younesbelkada in #992 - make distributed true for multiple process by @allanj in #997
- Fixed wrong trigger for warning by @zabealbe in #971
- Update how_to_train.md by @halfrot in #1003
- Adds
requires_grad
to input for non-quantized peft models by @younesbelkada in #1006 - [Multi-Adapter PPO] Fix and Refactor reward model adapter by @mnoukhov in #982
- Remove duplicate data loading in rl_training.py by @viethoangtranduong in #1020
- [Document] Minor fixes of sft_trainer document by @mutichung in #1029
- Update utils.py by @ZihanWang314 in #1012
- spelling is hard by @grahamannett in #1043
- Fixing accelerator version function call. by @ParthaEth in #1056
- [SFT Trainer] precompute packed iterable into a dataset by @lvwerra in #979
- Update doc CI by @lewtun in #1060
- Improve PreTrainedModelWrapper._get_current_device by @billvsme in #1048
- Update doc for the computer_metrics argument of SFTTrainer by @albertauyeung in #1062
- [
core
] Fix failing tests on main by @younesbelkada in #1065 - [
SFTTrainer
] Fix Trainer when args is None by @younesbelkada in #1064 - enable multiple eval datasets by @peter-sk in #1052
- Add missing
loss_type
inValueError
message by @alvarobartt in #1067 - Add args to SFT example by @lewtun in #1079
- add local folder support as input for rl_training. by @sywangyi in #1078
- Make CI happy by @younesbelkada in #1080
- Removing
tyro
insft_llama2.py
by @vwxyzjn in #1081 - Log arg consistency by @tcapelle in #1084
- Updated documentation for docs/source/reward_trainer.mdx to import th… by @cm2435 in #1092
- [Feature] Add Ascend NPU accelerator support by @statelesshz in #1096
peft_module_casting_to_bf16
util method,append_concat_token
flag, remove callbackPeftSavingCallback
by @pacman100 in #1110- Make prepending of bos token configurable. by @pacman100 in #1114
- fix gradient checkpointing when using PEFT by @pacman100 in #1118
- Update
description
insetup.py
by @alvarobartt in #1101
New Contributors
- @alvarobartt made their first contribution in #977
- @GeekDream-x made their first contribution in #994
- @MustSave made their first contribution in #988
- @allanj made their first contribution in #997
- @zabealbe made their first contribution in #971
- @viethoangtranduong made their first contribution in #1020
- @mutichung made their first contribution in #1029
- @ZihanWang314 made their first contribution in #1012
- @grahamannett made their first contribution in #1043
- @ChanderG made their first contribution in #1045
- @rdk31 made their first contribution in #1049
- @ParthaEth made their first contribution in #1056
- @billvsme made their first contribution in #1048
- @albertauyeung made their first contribution in #1062
- @peter-sk made their first contribution in #1052
- @sywangyi made their first contribution in #1078
- @tcapelle made their first contribution in #1084
- @cm2435 made their first contribution in #1092
- @statelesshz made their first contribution in #1096
- @pacman100 made their first contribution in #1110
Full Changelog: v0.7.4...v0.7.5