General updates:
- Better serialization for all models and tokenizers (BERT, GPT, GPT-2 and Transformer-XL) with best practices for saving/loading in readme and examples.
- Relaxing network connection requirements (fallback on the last downloaded model in the cache when we can't reach AWS to check eTag)
Breaking changes:
warmup_linear
method inOpenAIAdam
andBertAdam
is now replaced by flexible schedule classes for linear, cosine and multi-cycles schedules.
Bug fixes and improvements to the library modules:
- add a flag in BertTokenizer to skip basic tokenization (@john-hewitt)
- Allow tokenization of sequences > 512 (@CatalinVoss)
- clean up and extend learning rate schedules in BertAdam and OpenAIAdam (@lukovnikov)
- Update GPT/GPT-2 Loss computation (@CatalinVoss, @thomwolf)
- Make the TensorFlow conversion tool more robust (@marpaia)
- fixed BertForMultipleChoice model init and forward pass (@dhpollack)
- Fix gradient overflow in GPT-2 FP16 training (@SudoSharma)
- catch exception if pathlib not installed (@potatochip)
- Use Dropout Layer in OpenAIGPTMultipleChoiceHead (@pglock)
New scripts and improvements to the examples scripts:
- Add BERT language model fine-tuning scripts (@Rocketknight1)
- Added SST-2 task and remaining GLUE tasks to 'run_classifier.py' (@ananyahjha93, @jplehmann)
- GPT-2 generation fixes (@CatalinVoss, @spolu, @dhanajitb, @8enmann, @SudoSharma, @cynthia)