github huggingface/trl v0.4.2

latest releases: v0.8.6, v0.8.5, v0.8.4...
17 months ago

QLoRA RLHF, SFT Trainer and RewardTrainer

A new version of TRL that includes training larger models using QLoRA (4 bit quantization through bitsandbytes), brand new classes RewardTrainer and SFTTrainer to easily conduct your RLHF projects end-to-end!

Introducing SFTTrainer and RewardTrainer

Use the brand new trainer to easily train your reward model and supervised fine-tuned (SFT) model with few lines of code!

QLoRA integration

Pass 4bit models directly into PPOTrainer for more memory efficient training

Updated StackLlama example

Great work by @mnoukhov that managed to fix the issues related with StackLlama and the new versions of accelerate, peft and transformers. The completely reproducible examples below:

  • StackLLaMA: correctly merge peft model by @mnoukhov in #398
  • StackLlama: fixed RL training and added args by @mnoukhov in #400
  • Fixed some type annotations of trl.trainer.PPoTrainer by @JulesGM in #392
  • StackLLaMA: fix supervised finetuning and reward model training by @mnoukhov in #399

Bug fixes and improvements

New Contributors

Full Changelog: v0.4.1...v0.4.2

Don't miss a new trl release

NewReleases is sending notifications on new releases.