v4.2.0: LED from AllenAI, encoder-decoder templates, fast imports
LED from AllenAI (@patrickvonplaten)
Four new models are released as part of the LED implementation: LEDModel
, LEDForConditionalGeneration
, LEDForSequenceClassification
, LEDForQuestionAnswering
, in PyTorch. The first two models have a TensorFlow version.
LED is the encoder-decoder variant of the Longformer model by allenai.
The LED model was proposed in Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=led
Available notebooks:
- Evaluation: https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing
- Finetuning: https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing
Contributions:
- LED #9278 (@patrickvonplaten)
- [LED Test] fix common inputs pt for flaky pt-tf led test #9459 (@SBrandeis, @patrickvonplaten)
- [TF Led] Fix flaky TF Led test #9513 (@patrickvonplaten)
Generation Scores & other outputs (@patrickvonplaten)
The PyTorch generation function now allows to return:
scores
- the logits generated at each stepattentions
- all attention weights at each generation stephidden_states
- all hidden states at each generation step
by simply adding return_dict_in_generate
to the config or as an input to .generate()
Tweet:
Notebooks for a better explanation:
- https://discuss.huggingface.co/t/announcement-generationoutputs-scores-attentions-and-hidden-states-now-available-as-outputs-to-generate/3094/2
- https://discuss.huggingface.co/t/generation-probabilities-how-to-compute-probabilities-of-output-scores-for-gpt2/3175
PR:
- Add flags to return scores, hidden states and / or attention weights in GenerationMixin #9150 (@SBrandeis)
TensorFlow improvements
TensorFlow BERT-like model improvements (@jplu)
The TensorFlow version of the BERT-like models have been updated and are now twice as fast as the previous versions.
Better integration in TensorFlow Serving (@jplu)
This version introduces a new API for TensorFlow saved models, which can now be exported with model.save_pretrained("path", saved_model=True)
and easily loaded into a TensorFlow Serving environment.
DeepSpeed integration (@stas00)
Initial support for DeepSpeed to accelerate distributed training on several GPUs. This is an experimental feature that hasn't been fully tested yet, but early results are very encouraging (see this comment). Stay tuned for more details in the coming weeks!
Model templates (@patrickvonplaten)
The encoder-decoder version of the templates is now part of Transformers! Adding an encoder-decoder model is made very easy with this addition. More information can be found in the README.
- Model Templates for Seq2Seq #9251 (@patrickvonplaten)
- [Seq2Seq Templates] Add embedding scale to templates #9342 (@patrickvonplaten)
- [Seq2Seq Templates] Add forgotten imports to templates #9346 (@patrickvonplaten)
Faster import (@sgugger)
The initialization process has been changed to only import what is required. Therefore, when using only PyTorch models, TensorFlow will not be imported and vice-versa. In the best situations the import of a transformers model now takes only a few hundreds of milliseconds (~200ms) compared to more than a few seconds (~3s) in previous versions.
- Fast transformers import part 1 #9441 (@sgugger)
- Transformers fast import part 2 #9446 (@sgugger)
- Fast imports part 3 #9474 (@sgugger)
Documentation highlights (@Qbiwan, @NielsRogge)
Some models now have improved documentation. The LayoutLM
model has seen a general overhaul in its documentation thanks to @NielsRogge.
The tokenizer-only models Bertweet
, Herbert
and Phobert
now have their own documentation pages thanks to @Qbiwan.
- Improve LayoutLM #9476 (@NielsRogge)
- Improve documentation coverage for Bertweet #9379 (@Qbiwan)
- Improve documentation coverage for Herbert #9428 (@Qbiwan)
- Improve documentation coverage for Phobert #9427 (@Qbiwan)
Breaking changes
There are no breaking changes between the previous version and this one.
This will be the first version to require TensorFlow >= 2.3.
General improvements and bugfixes
- add tests for the new sharded ddp fairscale integration #9177 (@stas00)
- Added TF CTRL Sequence Classification #9151 (@spatil6)
- [trainer] apex fixes and tests #9180 (@stas00)
- Fix link to old NER fine-tuning script #9182 (@mrm8488)
- fixed not JSON serializable error in run_qa.py with fp16 #9186 (@WissamAntoun)
- [setup] correct transformers version format #9176 (@stas00)
- Fix link to old SQUAD fine-tuning script #9181 (@mrm8488)
- Add new run_swag example #9175 (@sgugger)
- Add timing inside Trainer #9196 (@sgugger)
- GPT-model attention heads pruning example #9189 (@altsoph)
- [t5 doc] typos #9199 (@stas00)
- [run_glue] add speed metrics #9198 (@stas00)
- Added TF TransfoXL Sequence Classification #9169 (@spatil6)
- [finetune trainer] better logging and help #9203 (@stas00)
- [RAG] Add Ray implementation for distributed retrieval #9197 (@amogkam)
- [T5] Fix warning for changed EncDec Attention Bias weight #9231 (@patrickvonplaten)
- Improve BERT-like models performance with better self attention #9124 (@jplu)
- Fix TF template #9234 (@jplu)
- Fix beam search generation for GPT2 and T5 on model parallelism #9219 (@TobiasNorlund)
- add base model classes to bart subclassed models #9230 (@patil-suraj)
- [MPNet] Add slow to fast tokenizer converter #9233 (@patrickvonplaten)
- Adding performer fine-tuning research exampke #9239 (@TevenLeScao)
- Update the README of the text classification example #9237 (@sgugger)
- [EncoderDecoder] Make tests more aggressive #9256 (@patrickvonplaten)
- Fix script that check objects are documented #9259 (@sgugger)
- Seq2seq trainer #9241 (@sgugger)
- Fix link to old language modeling script #9254 (@mrm8488)
- Fix link to bertabs/README.md #9255 (@mrm8488)
- Fix TF BART for saved model creation #9252 (@jplu)
- Add speed metrics to all example scripts + template #9260 (@sgugger)
- Revert renaming in finetune_trainer #9262 (@sgugger)
- Fix gpt2 document #9272 (@xu-song)
- Fix param error #9273 (@xu-song)
- [Seq2Seq Templates] Fix check_repo.py templates file #9277 (@patrickvonplaten)
- Minor documentation revisions from copyediting #9266 (@connorbrinton)
- Adapt to new name of
label_smoothing_factor
training arg #9282 (@sgugger) - Add caching mechanism to BERT, RoBERTa #9183 (@patil-suraj)
- [Templates] Adapt Bert #9284 (@patrickvonplaten)
- allow integer device for BatchEncoding #9271 (@jethrokuan)
- Fix typo in file_utils.py #9289 (@jungwhank)
- [bert_generation] enable cache by default #9296 (@patil-suraj)
- Proposed Fix : [RagSequenceForGeneration] generate "without" input_ids #9220 (@ratthachat)
- fix typo in modeling_encoder_decoder.py #9297 (@daniele-sartiano)
- Update tokenization_utils_base.py #9293 (@BramVanroy)
- [Bart doc] Fix outdated statement #9299 (@patrickvonplaten)
- add translation example #9303 (@vasudevgupta7)
- [GPT2] Correct gradient checkpointing #9308 (@patrickvonplaten)
- [Seq2SeqTrainer] Fix Typo #9320 (@patrickvonplaten)
- [Seq2Seq Templates] Correct some TF-serving errors and add gradient checkpointing to PT by default. #9334 (@patrickvonplaten)
- Fix TF T5 #9301 (@jplu)
- Fix TF TransfoXL #9302 (@jplu)
- [prophetnet] wrong import #9349 (@stas00)
- [apex.normalizations.FusedLayerNorm] torch.cuda.is_available() is redundant as apex handles that internally #9350 (@stas00)
- Make sure to use return dict for the encoder call inside RagTokenForGeneration #9363 (@dblakely)
- [Docs]
past_key_values
return a tuple of tuple as a default #9381 (@patrickvonplaten) - [docs] Fix TF base model examples: outputs.last_hidden_states -> state #9382 (@ck37)
- Fix typos in README and bugs in RAG example code for end-to-end evaluation and finetuning #9355 (@yoshitomo-matsubara)
- Simplify marian distillation script #9394 (@sshleifer)
- Add utility function for retrieving locally cached models #8836 (@cdpierse)
- Fix TF CTRL #9291 (@jplu)
- Put back LXMert example #9401 (@sgugger)
- Bump notebook from 6.1.4 to 6.1.5 in /examples/research_projects/lxmert #9402 (@dependabot[bot])
- Fix TF Flaubert #9292 (@jplu)
- [trainer] parametrize default output_dir #9352 (@stas00)
- Fix utils on Windows #9368 (@jplu)
- Fix TF DPR #9283 (@jplu)
- [Docs] Tokenizer Squad 2.0 example #9378 (@patrickvonplaten)
- replace apex.normalization.FusedLayerNorm with torch.nn.LayerNorm #9386 (@stas00)
- [test_model_parallelization] multiple fixes #9354 (@stas00)
- Fix TF Longformer #9348 (@jplu)
- [logging] autoflush #9385 (@stas00)
- TF >= 2.3 cleaning #9369 (@jplu)
- [trainer] --model_parallel hasn't been implemented for most models #9347 (@stas00)
- Fix TF Funnel #9300 (@jplu)
- Fix documentation links always pointing to master. #9217 (@sugeeth14)
- [examples/text-classification] Fix a bug for using own regression dataset #9411 (@forest1988)
- [trainer] group fp16 args together #9409 (@stas00)
- [model parallel] add experimental warning #9412 (@stas00)
- improve readme text to private models/versioning/api #9424 (@clmnt)
- [PyTorch Bart] Split Bart into different models #9343 (@patrickvonplaten)
- [docs] outline sharded ddp doc #9208 (@stas00)
- [Refactor] Splitting pipelines.py into its own module. #9279 (@Narsil)
- Fix link to Evaluate TAPAS Notebook #9414 (@mrm8488)
- Fix link to Notebook to fine-tune TAPAS #9413 (@mrm8488)
- Allow example to use a revision and work with private models #9407 (@sgugger)
- [trainer] self.model_wrapped + _model_unwrap #9390 (@stas00)
- Fix URLs to TAPAS notebooks #9435 (@NielsRogge)
- Upgrade styler to better handle lists #9423 (@sgugger)
- [Docs] Add useful links to model sharing #9431 (@patrickvonplaten)
- Store transformers version info when saving the model #9421 (@JetRunner)
- [GenerationOutputs] Fix GenerationOutputs Tests #9443 (@patrickvonplaten)
- Remove nested lxmert #9440 (@sgugger)
- [make fixup] a more reliable version of branching point discovery #9449 (@stas00)
- Prophetnet optimization #9453 (@guillaume-be)
- New serving #9419 (@jplu)
- [Docs] Improve model sharing doc #9454 (@patrickvonplaten)
- [TFGPT2] - Fix flaky past_key_values test #9460 (@patrickvonplaten)
- Removing duplicated code for Translation,Summarization and Text2TextGeneration pipelines #9433 (@Narsil)
- [README] Add new models #9465 (@patrickvonplaten)
- [Generation] Fix bug for manual decoder_input_ids + warning message #9472 (@patrickvonplaten)
- Makes HfArgumentParser compatible with Python 3.9 #9479 (@Tpt)
- Fix TF input for np.ndarray #9294 (@jplu)
- Making Conversation possible to create directly a full conversation #9434 (@Narsil)
- fix(wandb): fix config #9489 (@borisdayma)
- Fixing tests. It seems master changed something in the warnings. #9483 (@Narsil)
- Reformat the TF serving outputs #9482 (@jplu)
- [ray] add maintainers for Ray / Tune #9499 (@richardliaw)
- Fix template #9504 (@jplu)
- Full rework of the TF input/output embeddings and bias resizing #9193 (@jplu)
- Remove tolerance + drop_rows_to_fit by default #9507 (@LysandreJik)
- Fix template #9512 (@jplu)
- New Updated DistilGPT-2 Finetuning and Generation #9494 (@tripathiaakash)
- Make doc styler detect lists on rst and better support for Windows #9488 (@sgugger)
- Enable TruncationStrategy override for pipelines #9432 (@Narsil)
- [doc] How To Request Support document stab #9288 (@stas00)
- [trainer] remove
--model_parallel
#9451 (@stas00) - Fix cardinality #9505 (@jplu)
- Make doc styler behave properly on Windows #9516 (@sgugger)
- [trainer] round numbers in trainer state #9491 (@stas00)
- [make docs] parallel build #9522 (@stas00)
- [TFBart] Split TF-Bart #9497 (@patrickvonplaten)
- [ProphetNet] Fix naming and wrong config #9514 (@patrickvonplaten)
- Update 'Develop on Windows' guidelines #9519 (@SBrandeis)
- Shouldn't stale issues/PRs with feature request label #9511 (@LysandreJik)
- [Blenderbot] Fix Links #9532 (@patrickvonplaten)
- [T5] enable T5 fp16 #9487 (@patil-suraj)
- LayoutLM Config #9539 (@LysandreJik)
- Fix fill mask pipeline slow test using deprecated argument #9541 (@LysandreJik)
- Refactor
prepare_seq2seq_batch
#9524 (@sgugger) - Use the right version of tokenizers #9550 (@sgugger)
- fix BlenderbotSmallTokenizer #9538 (@patil-suraj)
- Doc: Update pretrained_models wording #9545 (@julien-c)
- Fix barthez tokenizer #9562 (@LysandreJik)
- Fix classification script: enable dynamic padding with truncation #9554 (@pashok3d)
- Speed up TopKLogitsWarper and TopPLogitsWarper (pytorch) #9557 (@LSinev)
- Update run_glue for do_predict with local test data (#9442) #9486 (@forest1988)
- [CI] use correct deps for torchhub #9552 (@stas00)
- Fix data parallelism in Trainer #9566 (@sgugger)
- Fix slow tests v4.2.0 #9561 (@LysandreJik)