Patch release: `SFTTrainer` and `PPOTrainer` bug fixes

What's Changed

Make shuffle optional by @lopez-hector in #457
Pre-commit by @vwxyzjn in #448
Debug the tortuous logic in _prepare_dataset function by @BeibinLi in #464
[CI] Fix CI RM by @younesbelkada in #468
Update sft_trainer.py by @JulesGM in #474
Refactor README by @younesbelkada in #460
add ratio threshold to avoid spikes by @lvwerra in #488
fix typo in reward_modeling.py by @csyourui in #494
FIX: contributing guidelines command by @BramVanroy in #493
Remove padding in batched generation. by @lvwerra in #487
Adds some options to stabilize the KL penalty by @edbeeching in #486
correctly implement gradient checkpointing to multi-adapter example by @mnoukhov in #479
Disable mlm by default in DataCollatorForCompletionOnlyLM, add ignore_index and docstring by @BramVanroy in #476
Use float instead of double to avoid issues with MPS device by @younesbelkada in #499
[PPOTrainer] Add prefix tuning support by @younesbelkada in #501
[PPOTrainer] Add prompt tuning support on TRL by @younesbelkada in #500
[SFTTrainer] Fix the sequence length check of SFTTrainer by @younesbelkada in #512

Full Changelog: v0.4.6...v0.4.7