TrOCR and VisionEncoderDecoderModel
One new model is released as part of the TrOCR implementation: TrOCRForCausalLM
, in PyTorch. It comes along a new VisionEncoderDecoderModel
class, which allows to mix-and-match any vision Transformer encoder with any text Transformer as decoder, similar to the existing SpeechEncoderDecoderModel
class.
The TrOCR model was proposed in TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models, by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
The TrOCR model consists of an image transformer encoder and an autoregressive text transformer to perform optical character recognition in an end-to-end manner.
- Add TrOCR + VisionEncoderDecoderModel by @NielsRogge in #13874
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?other=trocr
SEW & SEW-D
SEW and SEW-D (Squeezed and Efficient Wav2Vec) were proposed in Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
SEW and SEW-D models use a Wav2Vec-style feature encoder and introduce temporal downsampling to reduce the length of the transformer encoder. SEW-D additionally replaces the transformer encoder with a DeBERTa one. Both models achieve significant inference speedups without sacrificing the speech recognition quality.
Compatible checkpoints are available on the Hub: https://huggingface.co/models?other=sew and https://huggingface.co/models?other=sew-d
DistilHuBERT
DistilHuBERT was proposed in DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT, by Heng-Jui Chang, Shu-wen Yang, Hung-yi Lee.
DistilHuBERT is a distilled version of the HuBERT model. Using only two transformer layers, the model scores competitively on the SUPERB benchmark tasks.
Compatible checkpoint is available on the Hub: https://huggingface.co/ntu-spml/distilhubert
TensorFlow improvements
Several bug fixes and UX improvements for TensorFlow
Keras callback
Introduction of a Keras callback to push to the hub each epoch, or after a given number of steps:
- Keras callback to push to hub each epoch, or after N steps by @Rocketknight1 in #13773
Updates on the encoder-decoder framework
The encoder-decoder framework is now available in TensorFlow, allowing mixing and matching different encoders and decoders together into a single encoder-decoder architecture!
Besides this, the EncoderDecoderModel
classes have been updated to work similar to models like BART and T5. From now on, users don't need to pass decoder_input_ids
themselves anymore to the model. Instead, they will be created automatically based on the labels
(namely by shifting them one position to the right, replacing -100 by the pad_token_id
and prepending the decoder_start_token_id
). Note that this may result in training discrepancies if fine-tuning a model trained with versions anterior to 4.12.0 that set the decoder_input_ids
= labels
.
- Fix EncoderDecoderModel classes to be more like BART and T5 by @NielsRogge in #14139
Speech improvements
- Add DistilHuBERT by @anton-l in #14174
- [Speech Examples] Add pytorch speech pretraining by @patrickvonplaten in #13877
- [Speech Examples] Add new audio feature by @patrickvonplaten in #14027
- Add ASR colabs by @patrickvonplaten in #14067
- [ASR] Make speech recognition example more general to load any tokenizer by @patrickvonplaten in #14079
- [Examples] Add an official audio classification example by @anton-l in #13722
- [Examples] Use Audio feature in speech classification by @anton-l in #14052
Auto-model API
To make it easier to extend the Transformers library, every Auto class a new register
method, that allows you to register your own custom models, configurations or tokenizers. See more in the documentation
Bug fixes and improvements
- Fix filtering in test fetcher utils by @sgugger in #13766
- Fix warning for gradient_checkpointing by @sgugger in #13767
- Implement len in IterableDatasetShard by @sgugger in #13780
- [Wav2Vec2] Better error message by @patrickvonplaten in #13777
- Fix LayoutLM ONNX test error by @nishprabhu in #13710
- Enable readme link synchronization by @qqaatw in #13785
- Fix length of IterableDatasetShard and add test by @sgugger in #13792
- [docs/gpt-j] addd instructions for how minimize CPU RAM usage by @patil-suraj in #13795
- [examples
run_glue.py
] missing requirementsscipy
,sklearn
by @stas00 in #13768 - [examples/flax] use Repository API for push_to_hub by @patil-suraj in #13672
- Fix gather for TPU by @sgugger in #13813
- [testing] auto-replay captured streams by @stas00 in #13803
- Add MultiBERTs conversion script by @gchhablani in #13077
- [Examples] Improve mapping in accelerate examples by @patrickvonplaten in #13810
- [DPR] Correct init by @patrickvonplaten in #13796
- skip gptj slow generate tests by @patil-suraj in #13809
- Fix warning situation: UserWarning: max_length is ignored when padding=True" by @shirayu in #13829
- Updating CITATION.cff to fix GitHub citation prompt BibTeX output. by @arfon in #13833
- Add TF notebooks by @Rocketknight1 in #13793
- Bart: check if decoder_inputs_embeds is set by @silviu-oprea in #13800
- include megatron_gpt2 in installed modules by @stas00 in #13834
- Delete MultiBERTs conversion script by @gchhablani in #13852
- Remove a duplicated bullet point in the GPT-J doc by @yaserabdelaziz in #13851
- Add Mistral GPT-2 Stability Tweaks by @siddk in #13573
- Fix broken link to distill models in docs by @Randl in #13848
- ✨ update image classification example by @nateraw in #13824
- Update no_* argument (HfArgumentParser) by @BramVanroy in #13865
- Update Tatoeba conversion by @Traubert in #13757
- Fixing 1-length special tokens cut. by @Narsil in #13862
- Fix flax summarization example: save checkpoint after each epoch and push checkpoint to the hub by @ydshieh in #13872
- Fixing empty prompts for text-generation when BOS exists. by @Narsil in #13859
- Improve error message when loading models from Hub by @aphedges in #13836
- Initial support for symbolic tracing with torch.fx allowing dynamic axes by @michaelbenayoun in #13579
- Allow dataset to be an optional argument for (Distributed)LengthGroupedSampler by @ZhaofengWu in #13820
- Fixing question-answering with long contexts by @Narsil in #13873
- fix(integrations): consider test metrics by @borisdayma in #13888
- fix: replace asserts by value error by @m5l14i11 in #13894
- Update parallelism.md by @hyunwoongko in #13892
- Autodocument the list of ONNX-supported models by @sgugger in #13884
- Fixing GPU for token-classification in a better way. by @Narsil in #13856
- Update FSNER code in examples->research_projects->fsner by @sayef in #13864
- Replace assert statements with exceptions by @ddrm86 in #13871
- Fixing Backward compatiblity for zero-shot by @Narsil in #13855
- Update run_qa.py - CorrectTypo by @akulagrawal in #13857
- T5ForConditionalGeneration: enabling using past_key_values and labels in training by @yssjtu in #13805
- Fix trainer logging_nan_inf_filter in torch_xla mode by @ymwangg in #13896
- Fix hp search for non sigopt backends by @sgugger in #13897
- [Trainer] Fix nan-loss condition by @anton-l in #13911
- Raise exceptions instead of asserts in utils/download_glue_data by @hirotasoshu in #13907
- Add an example of exporting BartModel + BeamSearch to ONNX module. by @fatcat-z in #13765
- #12789 Replace assert statements with exceptions by @djroxx2000 in #13909
- Add missing whitespace to multiline strings by @aphedges in #13916
- [Wav2Vec2] Fix mask_feature_prob by @patrickvonplaten in #13921
- Fixes a minor doc issue (missing character) by @mishig25 in #13922
- Fix LED by @Rocketknight1 in #13882
- Add BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese by @datquocnguyen in #13788
- [trainer] memory metrics: add memory at the start report by @stas00 in #13915
- Image Segmentation pipeline by @mishig25 in #13828
- Adding support for tokens being suffixes or part of each other. by @Narsil in #13918
- Adds
PreTrainedModel.framework
attribute by @StellaAthena in #13817 - Fixed typo: herBERT -> HerBERT by @adamjankaczmarek in #13936
- [Generation] Fix max_new_tokens by @patrickvonplaten in #13919
- Fix typo in README.md by @fullyz in #13883
- Update bug-report.md by @LysandreJik in #13934
- fix issue #13904 -attribute does not exist- by @oraby8 in #13942
- Raise ValueError instead of asserts in src/transformers/benchmark/benchmark.py by @AkechiShiro in #13951
- Honor existing attention mask in tokenzier.pad by @sgugger in #13926
- [Gradient checkpoining] Correct disabling
find_unused_parameters
in Trainer when gradient checkpointing is enabled by @patrickvonplaten in #13961 - Change DataCollatorForSeq2Seq to pad labels to a multiple of
pad_to_multiple_of
by @affjljoo3581 in #13949 - Replace assert with unittest assertions by @LuisFerTR in #13957
- Raise exceptions instead of asserts in src/transformers/data/processors/xnli.py by @midhun1998 in #13945
- Make username optional in hub_model_id by @sgugger in #13940
- Raise exceptions instead of asserts in src/transformers/data/processors/utils.py by @killazz67 in #13938
- Replace assert by ValueError of src/transformers/models/electra/modeling_{electra,tf_electra}.py and all other models that had copies by @AkechiShiro in #13955
- Fix missing tpu variable in benchmark_args_tf.py by @hardianlawi in #13968
- Specify im-seg mask greyscole mode by @mishig25 in #13974
- [Wav2Vec2] Make sure tensors are always bool for mask_indices by @patrickvonplaten in #13977
- Fixing the lecture values by making sure defaults are not changed by @Narsil in #13976
- [parallel doc] dealing with layers larger than one gpu by @stas00 in #13980
- Remove wrong model_args supplied by @qqaatw in #13937
- Allow single byte decoding by @patrickvonplaten in #13988
- Replace assertion with ValueError exception by @ddrm86 in #14006
- Add strong test for configuration attributes by @sgugger in #14000
- Fix FNet tokenizer tests by @LysandreJik in #13995
- [Testing] Move speech datasets to
hf-internal
testing ... by @patrickvonplaten in #14008 - Raise exceptions instead of asserts in src/transformers/models/bart/modeling_flax_[bart, marian, mbart, pegasus].py by @killazz67 in #13939
- Scatter dummies + skip pipeline tests by @LysandreJik in #13996
- Fixed horizon_length for PPLM by @jacksukk in #13886
- Fix: replace assert statements with exceptions in file src/transformers/models/lxmert/modeling_lxmert.py by @murilo-goncalves in #14029
- [Docs] More general docstrings by @patrickvonplaten in #14028
- [CLIP] minor fixes by @patil-suraj in #14026
- Don't duplicate the elements in dir by @sgugger in #14023
- Replace assertions with ValueError exceptions by @ddrm86 in #14018
- Fixes typo in
modeling_speech_to_text
by @mishig25 in #14044 - [Speech] Move all examples to new audio feature by @patrickvonplaten in #14045
- Update SEW integration test tolerance by @anton-l in #14048
- [Flax] Clip fix test by @patrickvonplaten in #14046
- Fix save when laod_best_model_at_end=True by @sgugger in #14054
- [Speech] Refactor Examples by @patrickvonplaten in #14040
- fix typo by @yyy-Apple in #14049
- Fix typo by @ihoromi4 in #14056
- [FX] Fix passing None as concrete args when tracing by @thomasw21 in #14022
- TF Model train and eval step metrics for seq2seq models. by @pedro-r-marques in #14009
- update to_py_obj to support np.number by @PrettyMeng in #14064
- Trainer._load_rng_state() path fix (#14069) by @tlby in #14071
- replace assert with exception in src/transformers/utils/model_pararallel_utils.py by @skpig in #14072
- Add missing autocast() in Trainer.prediction_step() by @juice500ml in #14075
- Fix assert in src/transformers/data/datasets/language_modeling.py by @skpig in #14077
- Fix label attribution in token classification examples by @sgugger in #14055
- Context managers by @lvwerra in #13900
- Fix broken link in the translation section of task summaries by @h4iku in #14087
- [ASR] Small fix model card creation by @patrickvonplaten in #14093
- Change asserts in src/transformers/models/xlnet/ to raise ValueError by @WestonKing-Leatham in #14088
- Replace assertions with ValueError exceptions by @ddrm86 in #14061
- [Typo] Replace "Masked" with "Causal" in TF CLM script by @cakiki in #14014
- [Examples] Add audio classification notebooks by @anton-l in #14099
- Fix ignore_mismatched_sizes by @qqaatw in #14085
- Fix typo in comment by @stalkermustang in #14102
- Replace assertion with ValueError exception by @ddrm86 in #14098
- fix typo in license docstring by @21jun in #14094
- Fix a typo in preprocessing docs by @h4iku in #14108
- Replace assertions with ValueError exceptions by @iDeepverma in #14091
- [tests] fix hubert test sort by @patrickvonplaten in #14116
- Replace assert statements with exceptions (#13871) by @ddrm86 in #13901
- Translate README.md to Korean by @yeounyi in #14015
- Replace assertions with valueError Exeptions by @jyshdewangan in #14117
- Fix assertion in models by @skpig in #14090
- [wav2vec2] Add missing --validation_split_percentage data arg by @falcaopetri in #14119
- Rename variables with unclear naming by @qqaatw in #14122
- Update TP parallel GEMM image by @hyunwoongko in #14112
- Fix some typos in the docs by @h4iku in #14126
- Supporting Seq2Seq model for question answering task by @karthikrangasai in #13432
- Fix rendering of examples version links by @h4iku in #14134
- Fix some writing issues in the docs by @h4iku in #14136
- BartEnocder add set_input_embeddings by @Liangtaiwan in #13960
- Remove unneeded
to_tensor()
in TF inline example by @Rocketknight1 in #14140 - Enable DefaultDataCollator class by @Rocketknight1 in #14141
- Fix lazy init to stop hiding errors in import by @sgugger in #14124
- Add TF<>PT and Flax<>PT everywhere by @patrickvonplaten in #14047
- Add Camembert to models exportable with ONNX by @ChainYo in #14059
- [Speech Recognition CTC] Add auth token to fine-tune private models by @patrickvonplaten in #14154
- Add vision_encoder_decoder to models/init.py by @ydshieh in #14151
- [Speech Recognition] - Distributed training: Make sure vocab file removal and creation don't interfer by @patrickvonplaten in #14161
- Include Keras tensor in the allowed types by @sergiovalmac in #14155
- [megatron_gpt2] dynamic gelu, add tokenizer, save config by @stas00 in #13928
- Add Unispeech & Unispeech-SAT by @patrickvonplaten in #13963
- [ONNX] Add symbolic function for XSoftmax op for exporting to ONNX. by @fatcat-z in #14013
- Typo on ner accelerate example code by @monologg in #14150
- fix typos in error messages in speech recognition example and modelcard.py by @mgoldey in #14166
- Replace assertions with ValueError exception by @huberemanuel in #14142
- switch to inference_mode from no_gard by @kamalkraj in #13667
- Fix gelu test for torch 1.10 by @LysandreJik in #14167
- [Gradient checkpointing] Enable for Deberta + DebertaV2 + SEW-D by @patrickvonplaten in #14175
- [Pipelines] Fix ASR model types check by @anton-l in #14178
- Replace assert of data/data_collator.py by ValueError by @AkechiShiro in #14131
- [TPU tests] Enable first TPU examples pytorch by @patrickvonplaten in #14121
- [modeling_utils] respect original dtype in _get_resized_lm_head by @stas00 in #14181
New Contributors
- @arfon made their first contribution in #13833
- @silviu-oprea made their first contribution in #13800
- @yaserabdelaziz made their first contribution in #13851
- @Randl made their first contribution in #13848
- @Traubert made their first contribution in #13757
- @ZhaofengWu made their first contribution in #13820
- @m5l14i11 made their first contribution in #13894
- @hyunwoongko made their first contribution in #13892
- @ddrm86 made their first contribution in #13871
- @akulagrawal made their first contribution in #13857
- @yssjtu made their first contribution in #13805
- @ymwangg made their first contribution in #13896
- @hirotasoshu made their first contribution in #13907
- @fatcat-z made their first contribution in #13765
- @djroxx2000 made their first contribution in #13909
- @adamjankaczmarek made their first contribution in #13936
- @oraby8 made their first contribution in #13942
- @AkechiShiro made their first contribution in #13951
- @affjljoo3581 made their first contribution in #13949
- @LuisFerTR made their first contribution in #13957
- @midhun1998 made their first contribution in #13945
- @killazz67 made their first contribution in #13938
- @hardianlawi made their first contribution in #13968
- @jacksukk made their first contribution in #13886
- @murilo-goncalves made their first contribution in #14029
- @yyy-Apple made their first contribution in #14049
- @ihoromi4 made their first contribution in #14056
- @thomasw21 made their first contribution in #14022
- @pedro-r-marques made their first contribution in #14009
- @PrettyMeng made their first contribution in #14064
- @tlby made their first contribution in #14071
- @skpig made their first contribution in #14072
- @juice500ml made their first contribution in #14075
- @h4iku made their first contribution in #14087
- @WestonKing-Leatham made their first contribution in #14088
- @cakiki made their first contribution in #14014
- @stalkermustang made their first contribution in #14102
- @iDeepverma made their first contribution in #14091
- @yeounyi made their first contribution in #14015
- @jyshdewangan made their first contribution in #14117
- @karthikrangasai made their first contribution in #13432
- @ChainYo made their first contribution in #14059
- @sergiovalmac made their first contribution in #14155
- @huberemanuel made their first contribution in #14142
Full Changelog: v4.11.0...v4.12.0