Bert Seq2Seq models, FSMT, Funnel Transformer, LXMERT

BERT Seq2seq models

The BertGeneration model is a BERT model that can be leveraged for sequence-to-sequence tasks using EncoderDecoderModel as proposed in Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.

It was added to the library in PyTorch with the following checkpoints:

google/roberta2roberta_L-24_bbc
google/roberta2roberta_L-24_gigaword
google/roberta2roberta_L-24_cnn_daily_mail
google/roberta2roberta_L-24_discofuse
google/roberta2roberta_L-24_wikisplit
google/bert2bert_L-24_wmt_de_en
google/bert2bert_L-24_wmt_en_de

Contributions:

Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. #6594 (@patrickvonplaten)

FSMT (FairSeq MachineTranslation)

FSMT (FairSeq MachineTranslation) models were introduced in Facebook FAIR’s WMT19 News Translation Task Submission by Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov.

It was added to the library in PyTorch, with the following checkpoints:

facebook/wmt19-en-ru
facebook/wmt19-en-de
facebook/wmt19-ru-en
facebook/wmt19-de-en

Contributions:

[ported model] FSMT (FairSeq MachineTranslation) #6940 (@stas00)
build/eval/gen-card scripts for fsmt #7155 (@stas00)
skip failing FSMT CUDA tests until investigated #7220 (@stas00)
[fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test #7224 (@stas00)
[s2s] adjust finetune + test to work with fsmt #7263 (@stas00)
[fsmt] SinusoidalPositionalEmbedding no need to pass device #7292 (@stas00)
Adds FSMT to LM head AutoModel #7312 (@LysandreJik)

LayoutLM

The LayoutLM model was proposed in LayoutLM: Pre-training of Text and Layout for Document Image Understandin by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. It’s a simple but effective pre-training method of text and layout for document image understanding and information extraction tasks, such as form understanding and receipt understanding.

It was added to the library in PyTorch with the following checkpoints:

layoutlm-base-uncased
layoutlm-large-uncased

Contributions:

Add LayoutLM Model #7064 (@liminghao1630)
Fixes for LayoutLM #7318 (@sgugger)

Funnel Transformer

The Funnel Transformer model was proposed in the paper Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing. It is a bidirectional transformer model, like BERT, but with a pooling operation after each block of layers, a bit like in traditional convolutional neural networks (CNN) in computer vision.

It was added to the library in both PyTorch and TensorFlow, with the following checkpoints:

funnel-transformer/small
funnel-transformer/small-base
funnel-transformer/medium
funnel-transformer/medium-base
funnel-transformer/intermediate
funnel-transformer/intermediate-base
funnel-transformer/large
funnel-transformer/large-base
funnel-transformer/xlarge
funnel-transformer/xlarge-base

Contributions:

Funnel transformer #6908 (@sgugger)
Add TF Funnel Transformer #7029 (@sgugger)

LXMERT

The LXMERT model was proposed in LXMERT: Learning Cross-Modality Encoder Representations from Transformers by Hao Tan & Mohit Bansal. It is a series of bidirectional transformer encoders (one for the vision modality, one for the language modality, and then one to fuse both modalities) pre-trained using a combination of masked language modeling, visual-language text alignment, ROI-feature regression, masked visual-attribute modeling, masked visual-object modeling, and visual-question answering objectives. The pretraining consists of multiple multi-modal datasets: MSCOCO, Visual-Genome + Visual-Genome Question Answering, VQA 2.0, and GQA.

It was added to the library in TensorFlow with the following checkpoints:

unc-nlp/lxmert-base-uncased
unc-nlp/lxmert-vqa-uncased
unc-nlp/lxmert-gqa-uncased

Contributions

Adding the LXMERT pretraining model (MultiModal languageXvision) to HuggingFace's suite of models #5793 (@eltoto1219)
[LXMERT] Fix tests on gpu #6946 (@patrickvonplaten)

New pipelines

The following pipeline was added to the library:

[pipelines] Text2TextGenerationPipeline #6744 (@patil-suraj)

Notebooks

The following community notebooks were contributed to the library:

Demoing LXMERT with raw images by incorporating the FRCNN model for roi-pooled extraction and bounding-box predction on the GQA answer set. #6986 (@eltoto1219)
[Community notebooks] Add notebook on fine-tuning GPT-2 Model with Trainer Class #7005 (@philschmid)
Add "Fine-tune ALBERT for sentence-pair classification" notebook to the community notebooks #7255 (@NadirEM)
added multilabel text classification notebook using distilbert to community notebooks #7201 (@DhavalTaunk08)

Encoder-decoder architectures

An additional encoder-decoder architecture was added:

[EncoderDecoder] Add xlm-roberta to encoder decoder #6878 (@patrickvonplaten)

Bug fixes and improvements

TF Flaubert w/ pre-norm #6841 (@LysandreJik)
Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task #6644 (@HuangLianzhe)
Fix in Adafactor docstrings #6845 (@sgugger)
Fix resuming training for Windows #6847 (@sgugger)
Only access loss tensor every logging_steps #6802 (@jysohn23)
Marian distill scripts + integration test #6799 (@sshleifer)
Add checkpointing to Ray Tune HPO #6747 (@krfricke)
Split hp search methods #6857 (@sgugger)
Update ONNX notebook to include section on quantization. #6831 (@mfuntowicz)
Fix marian slow test #6854 (@sshleifer)
[s2s] command line args for faster val steps #6833 (@sshleifer)
Bart can make decoder_input_ids from labels #6758 (@sshleifer)
add a final report to all pytest jobs #6861 (@stas00)
Logging doc #6852 (@sgugger)
Restore PaddingStrategy.MAX_LENGTH on QAPipeline while no v2. #6875 (@mfuntowicz)
[Generate] Facilitate PyTorch generate using ModelOutputs #6735 (@patrickvonplaten)
Add cache_dir to save features TextDataset #6879 (@jysohn23)
[Docs, Examples] Fix QA example for PT #6890 (@patrickvonplaten)
Update modeling_bert.py #6897 (@parthe)
[Electra] fix warning for position ids #6884 (@patrickvonplaten)
minor docs grammar fixes #6889 (@harrywang)
Fix error class instantiation #6634 (@tamuhey)
Output attention takes an s #6903 (@sgugger)
[testing] fix ambiguous test #6898 (@stas00)
test_tf_common: remove un_used mixin class parameters #6866 (@PuneethaPai)
Template updates #6914 (@sgugger)
Changed link to the correct paper in the second paragraph #6905 (@sengl)
tweak tar command in readme #6919 (@brettkoonce)
[s2s]: script to convert pl checkpoints to hf checkpoints #6911 (@sshleifer)
[s2s] allow task_specific_params=summarization_xsum #6923 (@sshleifer)
move wandb/comet logger init to train() to allow parallel logging #6850 (@krfricke)
[s2s] use --eval_beams command line arg #6926 (@sshleifer)
[s2s] support early stopping based on loss, rather than rouge #6927 (@sshleifer)
Fix mixed precision issue in TF DistilBert #6915 (@chiapas)
[docstring] misc arg doc corrections #6932 (@stas00)
[s2s] distill: --normalize_hidden --supervise_forward #6834 (@sshleifer)
[s2s] run_eval.py parses generate_kwargs #6948 (@sshleifer)
[doc] remove the implied defaults to :obj:None, s/True/ :obj:`True/, etc. #6956 (@stas00)
[s2s] warn if --fp16 for torch 1.6 #6977 (@sshleifer)
feat: allow prefix for any generative model #5885 (@borisdayma)
Trainer with grad accum #6930 (@sgugger)
Cannot index None #6984 (@LysandreJik)
[docstring] missing arg #6933 (@stas00)
[testing] add dependency: parametrize #6958 (@stas00)
Fixed the default number of attention heads in Reformer Configuration #6973 (@tznurmin)
[gen utils] missing else case #6980 (@stas00)
match CI's version of flake8 #6941 (@stas00)
Conversion scripts shouldn't have relative imports #6991 (@LysandreJik)
Add missing arguments for BertWordPieceTokenizer #5810 (@monologg)
fixed trainer tr_loss memory leak #6999 (@StuartMesham)
Floating-point operations logging in trainer #6768 (@TevenLeScao)
Fixing FLOPS merge by checking if torch is available #7013 (@LysandreJik)
[Longformer] Fix longformer documentation #7016 (@patrickvonplaten)
pegasus.rst: fix expected output #7017 (@sshleifer)
adding TRANSFORMERS_VERBOSITY env var #6961 (@stas00)
[generation] consistently add eos tokens #6982 (@stas00)
[from_pretrained] Allow tokenizer_type ≠ model_type #6995 (@julien-c)
replace torch.triu with onnx compatible code #6929 (@HenryDashwood)
Batch encore plus and overflowing tokens fails when non existing overflowing tokens for a sequence #6677 (@LysandreJik)
add -y to bypass prompt for transformers-cli upload #7035 (@stas00)
Fix confusing warnings during TF2 import from PyTorch #6623 (@jcrocholl)
Albert pretrain datasets/ datacollator #6168 (@yl-to)
Fix template #7040 (@LysandreJik)
Small fixes in tf template #7044 (@sgugger)
Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. #6594 (@patrickvonplaten)
fix to ensure that returned tensors after the tokenization is Long #7039 (@GeetDsa)
[BertGeneration] Correct Doc Title #7048 (@patrickvonplaten)
[BertGeneration, Docs] Fix another old name in docs #7050 (@patrickvonplaten)
[xlm tok] config dict: fix str into int to match definition #7034 (@stas00)
[s2s] --eval_max_generate_length #7018 (@sshleifer)
Fix CI with change of name of nlp #7054 (@sgugger)
[wip/s2s] DistributedSortishSampler #7056 (@sshleifer)
these tests require non-multigpu env #7059 (@stas00)
[BertGeneration] Clean naming #7068 (@patrickvonplaten)
Document the dependcy on datasets #7058 (@sgugger)
Automate the lists in auto-xxx docs #7061 (@sgugger)
Add tests and fix various bugs in ModelOutput #7073 (@sgugger)
Compute loss method #7074 (@sgugger)
[T5Tokenizer] remove prefix_tokens #7078 (@patil-suraj)
[s2s] run_eval supports --prefix clarg. #6953 (@sshleifer)
fix bug in pegasus converter #7094 (@sshleifer)
[s2s] two stage run_distributed_eval.py #7105 (@sshleifer)
Update xsum length penalty to better values #7107 (@sshleifer)
[s2s] distributed eval cleanup #7110 (@sshleifer)
[s2s distill] allow pegasus-12-12 #7104 (@sshleifer)
Temporarily skip failing tests due to dependency change #7118 (@LysandreJik)
fix link to paper #7116 (@btel)
ignore FutureWarning in tests #7079 (@stas00)
fix deprecation warnings #7033 (@stas00)
[examples testing] restore code #7099 (@stas00)
Clean up autoclass doc #7081 (@sgugger)
Add Mirror Option for Downloads #6679 (@JetRunner)
[s2s] distributed eval in one command #7124 (@sshleifer)
[QOL] add signature for prepare_seq2seq_batch #7108 (@sshleifer)
Fix reproducible tests in Trainer #7119 (@sgugger)
[logging] remove no longer needed verbosity override #7100 (@stas00)
Fix TF Trainer loss calculation #6998 (@chiapas)
Add quotes to paths in MeCab arguments #7142 (@polm)
Multi predictions trainer #7126 (@sgugger)
fix ZeroDivisionError and epoch counting #7125 (@chiapas)
[EncoderDecoderModel] fix indentation error #7131 (@patrickvonplaten)
[docs] add testing documentation #7101 (@stas00)
Refactoring the TF activations functions #7150 (@jplu)
fix the warning message of overflowed sequence #7151 (@xiye17)
[doc] [testing] improve/expand the Parametrization section #7156 (@stas00)
Add empty random document case to DataCollatorForNextSentencePrediction #7161 (@choidongyeon)
[s2s run_eval] new features #7109 (@stas00)
use the correct add_start_docstrings #7174 (@stas00)
[s2s] distributed eval cleanup #7186 (@sshleifer)
remove duplicated code #7173 (@stas00)
remove deprecated flag #7171 (@stas00)
Transformer-XL: Remove unused parameters #7087 (@RafaelWO)
Trainer multi label #7191 (@sgugger)
Change to use relative imports in some files & Add python prompt symbols to example codes #7202 (@soheeyang)
[s2s] run_eval/run_eval_search tweaks #7192 (@stas00)
[s2s] dynamic batch size with --max_tokens_per_batch #7030 (@sshleifer)
[s2s] remove double assert #7223 (@sshleifer)
Add customized text to widget #7204 (@mrm8488)
Rewrites BERT in Flax to the new Linen API #7211 (@marcvanzee)
token-classification: update url of GermEval 2014 dataset #6571 (@stefan-it)
Fix a few countings (steps / epochs) in trainer_tf.py #7175 (@chiapas)
Add new pre-trained models BERTweet and PhoBERT #6129 (@datquocnguyen)
[s2s] distributed_eval.py saves better speed info #7242 (@sshleifer)
[testing doc] @slow has to be last #7251 (@stas00)
examples/seq2seq/init.py mutates sys.path #7194 (@stas00)
[Bug fix] Fixed target_mapping preparation for XLNet (Pytorch) #7267 (@guillaume-be)
[example/glue] fix compute_metrics_fn for bart like models #7248 (@patil-suraj)
Disable missing weight warning for RobertaForMaskedLM/CamembertForMaskedLM #7282 (@raphael0202)
Fix #7284 #7289 (@sgugger)
[s2s tests] fix test_run_eval_search #7297 (@stas00)
[s2s] s/alpha_loss_encoder/alpha_encoder_loss/ #7298 (@stas00)
[s2s] save hostname with repo info #7301 (@sshleifer)
Copy code from Bert to Roberta and add safeguard script #7219 (@sgugger)
Fix #7304 #7305 (@sgugger)
Fix saving TF custom models #7291 (@jplu)
is_pretokenized -> is_split_into_words #7236 (@sgugger)
Add possibility to evaluate every epoch #7302 (@sgugger)
Support for Windows in check_copies #7316 (@sgugger)
Create an XLA parameter and fix the mixed precision #7311 (@jplu)

huggingface/transformers v3.2.0 Bert Seq2Seq models, FSMT, LayoutLM, Funnel Transformer, LXMERT on GitHub

Bert Seq2Seq models, FSMT, Funnel Transformer, LXMERT