New Trainer: KTOTrainer:
We recently introduced the KTOTrainer in order to run KTO algorithms on LLMs !
- fix bugs in KTO implementation by @kawine in #1380
- [KTO] merge eval dataset only if it exists by @kashif in #1383
- [KTO] prevent nans from appearing in metrics by @kawine in #1386
- Kto trainer by @kashif in #1181
- [KTO] fix tokenization bugs by @kawine in #1418
- [KTO] model init when args are given by @kashif in #1413
- [KTO] fix various bugs by @kawine in #1402
TRL Command Line Interfaces (CLIs):
Run SFT, DPO and chat with your aligned model directly from the terminal:
SFT:
trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb
DPO:
trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/Anthropic-hh-rlhf-processed --output_dir opt-sft-hh-rlhf
Chat:
trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat
Read more about CLI in the relevant documentation section or use --help
for more details.
- FEAT: Add CLIs in TRL ! by @younesbelkada in #1419
- CI / CLI: Properly raise error when CLI tests failed by @younesbelkada in #1446
- chat cli by @lvwerra in #1431
- Fix yaml parsing issue by @younesbelkada in #1450
model
-->model_name_or_path
by @lvwerra in #1452- FEAT: Update README to add DPO + CLIs by @younesbelkada in #1448
FSDP + QLoRA:
SFTTrainer now supports FSDP + QLoRA
- Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA by @pacman100 in #1416
Other fixes
- set dev version by @younesbelkada in #1332
- Update stack llama 2 example to reflect #aa35fec by @nautsimon in #1333
- FIX: More user friendly error when users don't have PEFT by @younesbelkada in #1350
- fix 8-bit multi-gpu training bug by @fancyerii in #1353
- set seed in sft/dpo/reward_modeling to make result reproducable by @sywangyi in #1357
- Fix transformers version checking for Python < 3.8 by @samuki in #1363
- Add some arguments for support XPU by @yuanwu2017 in #1366
- ENH: Send Docker and transformers main CI results on slack after merging on main by @younesbelkada in #1370
- FEAT: [
SFTTrainer
] Addeval_packing
by @younesbelkada in #1369 - FEAT:
force_use_ref_model
for power users by @younesbelkada in #1367 - FIX: fix after #1370 by @younesbelkada in #1372
- FIX: Change ci to fail-fast=False by @younesbelkada in #1373
- FIX: Fix the CI again .. by @younesbelkada in #1374
- Log ddpo reward as float to fix numpy conversion during bf16 training by @skavulya in #1391
- Fix the pad_token_id error by @yuanwu2017 in #1394
- FIX [
RewardModeling
] Fix RM script for PEFT by @younesbelkada in #1393 - Fix import error from deprecation in transformers by @lewtun in #1415
- CI: Fix CI on main by @younesbelkada in #1422
- [Kto] torch_dtype kwargs fix by @kashif in #1429
- Create standard dataset for TRL by @vwxyzjn in #1424
- FIX: fix doc build on main by @younesbelkada in #1437
- Fix PPOTrainer README example by @nikihowe in #1441
- Before update the tr_loss, make sure tr_loss_step is in the same device. by @pengwei715 in #1439
- Release: v0.8.0 by @younesbelkada in #1453
New Contributors
- @nautsimon made their first contribution in #1333
- @fancyerii made their first contribution in #1353
- @samuki made their first contribution in #1363
- @yuanwu2017 made their first contribution in #1366
- @kawine made their first contribution in #1380
- @skavulya made their first contribution in #1391
- @pengwei715 made their first contribution in #1439
Full Changelog: v0.7.11...v0.8.0