huggingface/trl v0.8.0 on GitHub

New Trainer: KTOTrainer:

We recently introduced the KTOTrainer in order to run KTO algorithms on LLMs !

fix bugs in KTO implementation by @kawine in #1380
[KTO] merge eval dataset only if it exists by @kashif in #1383
[KTO] prevent nans from appearing in metrics by @kawine in #1386
Kto trainer by @kashif in #1181
[KTO] fix tokenization bugs by @kawine in #1418
[KTO] model init when args are given by @kashif in #1413
[KTO] fix various bugs by @kawine in #1402

TRL Command Line Interfaces (CLIs):

Run SFT, DPO and chat with your aligned model directly from the terminal:

SFT:

trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb

DPO:

trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/Anthropic-hh-rlhf-processed --output_dir opt-sft-hh-rlhf

Chat:

trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat

Read more about CLI in the relevant documentation section or use --help for more details.

FEAT: Add CLIs in TRL ! by @younesbelkada in #1419
CI / CLI: Properly raise error when CLI tests failed by @younesbelkada in #1446
chat cli by @lvwerra in #1431
Fix yaml parsing issue by @younesbelkada in #1450
model --> model_name_or_path by @lvwerra in #1452
FEAT: Update README to add DPO + CLIs by @younesbelkada in #1448

FSDP + QLoRA:

SFTTrainer now supports FSDP + QLoRA

Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA by @pacman100 in #1416

Other fixes

set dev version by @younesbelkada in #1332
Update stack llama 2 example to reflect #aa35fec by @nautsimon in #1333
FIX: More user friendly error when users don't have PEFT by @younesbelkada in #1350
fix 8-bit multi-gpu training bug by @fancyerii in #1353
set seed in sft/dpo/reward_modeling to make result reproducable by @sywangyi in #1357
Fix transformers version checking for Python < 3.8 by @samuki in #1363
Add some arguments for support XPU by @yuanwu2017 in #1366
ENH: Send Docker and transformers main CI results on slack after merging on main by @younesbelkada in #1370
FEAT: [SFTTrainer] Add eval_packing by @younesbelkada in #1369
FEAT: force_use_ref_model for power users by @younesbelkada in #1367
FIX: fix after #1370 by @younesbelkada in #1372
FIX: Change ci to fail-fast=False by @younesbelkada in #1373
FIX: Fix the CI again .. by @younesbelkada in #1374
Log ddpo reward as float to fix numpy conversion during bf16 training by @skavulya in #1391
Fix the pad_token_id error by @yuanwu2017 in #1394
FIX [RewardModeling] Fix RM script for PEFT by @younesbelkada in #1393
Fix import error from deprecation in transformers by @lewtun in #1415
CI: Fix CI on main by @younesbelkada in #1422
[Kto] torch_dtype kwargs fix by @kashif in #1429
Create standard dataset for TRL by @vwxyzjn in #1424
FIX: fix doc build on main by @younesbelkada in #1437
Fix PPOTrainer README example by @nikihowe in #1441
Before update the tr_loss, make sure tr_loss_step is in the same device. by @pengwei715 in #1439
Release: v0.8.0 by @younesbelkada in #1453

New Contributors

@nautsimon made their first contribution in #1333
@fancyerii made their first contribution in #1353
@samuki made their first contribution in #1363
@yuanwu2017 made their first contribution in #1366
@kawine made their first contribution in #1380
@skavulya made their first contribution in #1391
@pengwei715 made their first contribution in #1439

Full Changelog: v0.7.11...v0.8.0

huggingface/trl v0.8.0 v0.8.0: KTOTrainer, TRL CLIs, QLoRA + FSDP ! on GitHub

New Trainer: KTOTrainer:

TRL Command Line Interfaces (CLIs):

FSDP + QLoRA:

Other fixes

New Contributors

huggingface/trl v0.8.0
v0.8.0: KTOTrainer, TRL CLIs, QLoRA + FSDP !

on GitHub