v0.7.8: Unsloth tag, DPO fixes, PEFT support for DDPO

Unsloth tag for `xxxTrainer`

If users use Unsloth library, the unsloth tag gets automatically pushed on the Hub.

Some important fixes for DPO has been introduced to address: https://twitter.com/jon_durbin/status/1743575483365699809 and to make DPO faster

Allow separate devices for target/ref models. by @jondurbin in #1190
Allow swapping PEFT adapters for target/ref model. by @jondurbin in #1193
Change device access order for speedup of calculating metrics in DPOTrainer by @brcps12 in #1154

Now DDPO supports PEFT

add peft_module_casting_to_bf16 in DPOTrainer by @sywangyi in #1143
SFT Tokenizer Fix by @ChrisCates in #1142
Minor fixes to some comments in some examples. by @mattholl in #1156
Correct shapes in docstring of PPOTrainer's train_minibatch method by @nikihowe in #1170
Update sft_trainer.py by @Hemanthkumar2112 in #1162
Fix batch all gather by @vwxyzjn in #1177
Address issue #1122 by @maneandrea in #1174
Fix misleading variable "epoch" from the training loop from PPOTrainer Doc. by @Jfhseh in #1171
SFTTrainer: follow args.remove_unused_columns by @mgerstgrasser in #1188
Handle last token from generation prompt by @pablovicente in #1153

Full Changelog: v0.7.7...v0.7.8