Reformer (@patrickvonplaten)
-
Added a new model "Reformer": https://arxiv.org/abs/2001.04451 to the library. Original trax code: https://github.com/google/trax/tree/master/trax/models/reformer was translated to PyTorch.
-
Reformer uses chunked attention and reversible layers to model sequences as long as 500,000 tokens.
-
Reformer is currently available as a casual language model and will soon also be available as encoder only ("Bert"-like) model.
-
Two pretrained weights are uploaded: https://huggingface.co/models?search=google%2Freformer
-
https://huggingface.co/google/reformer-enwik8 is the first char lm in the library
Additional architectures
- The
ElectraForSequenceClassification
was added by @liuzzi
Trainer Tweaks and fixes (@LysandreJik, @julien-c )
TPU (@LysandreJik):
- Model saving, as well as optimizer and scheduler saving mid-training were hanging
- Fixed the optimizer weight updates
Trainer (@julien-c)
- Fixed the
nn.DataParallel
support compatibility for PyTorchv1.5.0
- Distributed evaluation: SequentialDistributedSampler + gather all results
- Move model to correct device
- Map optimizer to correct device after loading from checkpoint (@shaoyent)
QOL: Tokenization, Pipelines
- New method for all tokenizers:
tokenizer.decode_batch
, to decode an entire batch (@sshleifer) - the NER pipeline now returns entity groups (@enzoampil)
ONNX Conversion script (@mfuntowicz)
- Added a conversion script to convert both PyTorch/TensorFlow models to ONNX.
- Added a notebook explaining how it works
Community notebooks
We've started adding community notebooks to the repository. Three notebooks have made their way into our codebase:
Predict stage for GLUE task, easy submit to gluebenchmark.com
-Adds predict stage for glue tasks, and generate result files which can be submitted to gluebenchmark.com (@stdcoutzyx)
Fixes and improvements
- Support flake8 3.8 (@julien-c)
- Tests are now faster thanks to using dummy smaller models (@sshleifer)
- Fixed the eval loss in the trainer (@patil-suraj)
- Fixed the
p_mask
in SQuAD pre-processing (@LysandreJik) - Github Actions pytorch test are no longer pinned to
torch==1.4.0
(@mfuntowicz) - Fixed the multiple-choice script with overflowing tokens (@LysandreJik)
- Allow for
None
values inGradientAccumulator
(@jarednielsen, improved by @jplu) - MBart tokenizer saving/loading id was fixed (@Mehrad0711)
- TF generation: Fix issue for batch output generation of different output length.(@patrickvonplaten)
- Fixed the FP-16 support in the T5 model (@patrickvonplaten)
run_language_modeling
fix: actually use theoverwrite_cache
argument (@borisdayma)- Better, version compatible way to get the learning rate in the trainer (@rakeshchada)
- Fixed the slow tests that were failing on GPU (@sshleifer, @patrickvonplaten, @LysandreJik)
- ONNX conversion tokenizer fix (@RensDimmendaal)
- Correct TF formatting to exclude LayerNorms from weight decay (@oliverastrand)
- Removed warning of deprecation (@Colanim)
- fix no grad in second pruning in run_bertology (@TobiasLee)