Patch release: SFTTrainer
and PPOTrainer
bug fixes
What's Changed
- Make shuffle optional by @lopez-hector in #457
- Pre-commit by @vwxyzjn in #448
- Debug the tortuous logic in
_prepare_dataset
function by @BeibinLi in #464 - [
CI
] Fix CI RM by @younesbelkada in #468 - Update sft_trainer.py by @JulesGM in #474
- Refactor README by @younesbelkada in #460
- add ratio threshold to avoid spikes by @lvwerra in #488
- fix typo in reward_modeling.py by @csyourui in #494
- FIX: contributing guidelines command by @BramVanroy in #493
- Remove padding in batched generation. by @lvwerra in #487
- Adds some options to stabilize the KL penalty by @edbeeching in #486
- correctly implement gradient checkpointing to multi-adapter example by @mnoukhov in #479
- Disable mlm by default in DataCollatorForCompletionOnlyLM, add ignore_index and docstring by @BramVanroy in #476
- Use
float
instead ofdouble
to avoid issues with MPS device by @younesbelkada in #499 - [
PPOTrainer
] Add prefix tuning support by @younesbelkada in #501 - [
PPOTrainer
] Add prompt tuning support on TRL by @younesbelkada in #500 - [
SFTTrainer
] Fix the sequence length check ofSFTTrainer
by @younesbelkada in #512
New Contributors
- @lopez-hector made their first contribution in #457
- @BeibinLi made their first contribution in #464
- @csyourui made their first contribution in #494
- @BramVanroy made their first contribution in #493
Full Changelog: v0.4.6...v0.4.7