ORPO Trainer & Vision LLMs support for `SFTTrainer`, KTO fixes

This release includes two new trainers: ORPO from KAIST and CPO
The release also includes Vision LLM such as Llava support for SFTTrainer, please see: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

ORPO Trainer

Add CPOTrainer by @fe1ixxu in #1382
Add use_cache=False in {ORPO,CPO}Trainer.concatenated_forward by @alvarobartt in #1478
[ORPO] Update NLL loss to use input_ids instead by @alvarobartt in #1516

You can now use SFTTrainer to fine-tune VLLMs such as Llava !
See: https://github.com/huggingface/trl/blob/main/examples/scripts/vsft_llava.py for more details

Many fixes were introduced for the KTOTrainer:

set dev version by @younesbelkada in #1463
Use the standard dataset for DPO CLI by @vwxyzjn in #1456
[peft] Update test_reward_trainer.py to fix tests by @kashif in #1471
Fix hyperparameters in KTO example by @lewtun in #1474
docs: add missing Trainer classes and sort alphabetically by @anakin87 in #1479
hackey update to ModelConfig to allow lora_target_modules="all-linear" by @galtay in #1488
Ignore chat files by @lewtun in #1486
Add DPO link in README by @qgallouedec in #1502
Fix typo in how_to_train.md by @ftorres16 in #1503
Fix DPO Unsloth example in Docs by @arnavgarg1 in #1494
Correct ppo_epochs usage by @muhammed-shihebi in #1480
Fix RichProgressCallback by @eggry in #1496
Change the device index to device:index by @yuanwu2017 in #1490
FIX: use kwargs for RMTrainer by @younesbelkada in #1515
Allow streaming (datasets.IterableDataset) by @BramVanroy in #1468
Allow pre-tokenized datasets in SFTTrainer by @BramVanroy in #1520
[DOC] Add data description for sfttrainer doc by @BramVanroy in #1521
Release: v0.8.2 by @younesbelkada in #1522

Full Changelog: v0.8.1...v0.8.2