v0.7.8: Unsloth tag, DPO fixes, PEFT support for DDPO
Unsloth tag for xxxTrainer
If users use Unsloth library, the unsloth
tag gets automatically pushed on the Hub.
- [
xxxTrainer
] Add unsloth tag by @younesbelkada in #1130
DPO fixes
Some important fixes for DPO has been introduced to address: https://twitter.com/jon_durbin/status/1743575483365699809 and to make DPO faster
- Allow separate devices for target/ref models. by @jondurbin in #1190
- Allow swapping PEFT adapters for target/ref model. by @jondurbin in #1193
- Change device access order for speedup of calculating metrics in DPOTrainer by @brcps12 in #1154
DDPO + PEFT
Now DDPO supports PEFT
- add: support for
peft
in ddpo. by @sayakpaul in #1165
Other fixes
- add peft_module_casting_to_bf16 in DPOTrainer by @sywangyi in #1143
- SFT Tokenizer Fix by @ChrisCates in #1142
- Minor fixes to some comments in some examples. by @mattholl in #1156
- Correct shapes in docstring of PPOTrainer's train_minibatch method by @nikihowe in #1170
- Update sft_trainer.py by @Hemanthkumar2112 in #1162
- Fix batch all gather by @vwxyzjn in #1177
- Address issue #1122 by @maneandrea in #1174
- Fix misleading variable "epoch" from the training loop from PPOTrainer Doc. by @Jfhseh in #1171
- SFTTrainer: follow args.remove_unused_columns by @mgerstgrasser in #1188
- Handle last token from generation prompt by @pablovicente in #1153
New Contributors
- @ChrisCates made their first contribution in #1142
- @brcps12 made their first contribution in #1154
- @mattholl made their first contribution in #1156
- @sayakpaul made their first contribution in #1165
- @nikihowe made their first contribution in #1170
- @Hemanthkumar2112 made their first contribution in #1162
- @maneandrea made their first contribution in #1174
- @Jfhseh made their first contribution in #1171
- @mgerstgrasser made their first contribution in #1188
- @pablovicente made their first contribution in #1153
- @jondurbin made their first contribution in #1190
Full Changelog: v0.7.7...v0.7.8