huggingface/trl v0.3.0 on GitHub

What's Changed

fix style, typos, license by @natolambert in #103
fix re-added file by @natolambert in #116
add citation by @natolambert in #124
add manual seeding for RL experiments by @natolambert in #118
add set_seed to init.py by @lvwerra in #127
update docs with Seq2seq models, set_seed, and create_reference_model by @lvwerra in #128
[bug] Update gpt2-sentiment.py by @younesbelkada in #132
Fix Sentiment control notebook by @lvwerra in #126
realign values by @lvwerra in #137
Change unclear variables & fix typos by @natolambert in #134
Feat/reward summarization example by @TristanThrush in #115
[core] Small refactor of forward pass by @younesbelkada in #136
[tests] Add correct repo name by @younesbelkada in #138
fix forward batching for seq2seq and right padding models. by @lvwerra in #139
fix bug in batched_forward_pass by @ArvinZhuang in #144
[core] Add torch_dtype support by @younesbelkada in #147
[core] Fix dataloader issue by @younesbelkada in #154
[core] enable bf16 training by @younesbelkada in #156
[core] fix saving multi-gpu by @younesbelkada in #157
Added imports by @BirgerMoell in #159
Add CITATION.cff by @kashif in #169
[Doc] Add how to use Lion optimizer by @younesbelkada in #152
policy kl [old | new] by @kashif in #168
add minibatching by @lvwerra in #153
fix bugs in tutorial by @shizhediao in #175
[core] Add max_grad_norm support by @younesbelkada in #177
Add toxcitiy example by @younesbelkada in #162
[Docs] Fix barplot by @younesbelkada in #181

Full Changelog: v0.2.1...v0.3.0