github huggingface/transformers v4.23.0
v4.23.0: Whisper, Deformable DETR, Conditional DETR, MarkupLM, MSN, `safetensors`

latest releases: v4.44.2, v4.44.1, v4.44.0...
23 months ago

Whisper

The Whisper model was proposed in Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

Whisper is an encoder-decoder Transformer trained on 680,000 hours of labeled (transcribed) audio. The model shows impressive performance and robustness in a zero-shot setting, in multiple languages.

Deformable DETR

The Deformable DETR model was proposed in Deformable DETR: Deformable Transformers for End-to-End Object Detection by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.

Deformable DETR mitigates the slow convergence issues and limited feature spatial resolution of the original DETR by leveraging a new deformable attention module which only attends to a small set of key sampling points around a reference.

Conditional DETR

The Conditional DETR model was proposed in Conditional DETR for Fast Training Convergence by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.

Conditional DETR presents a conditional cross-attention mechanism for fast DETR training. Conditional DETR converges 6.7× to 10× faster than DETR.

Time Series Transformer

The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.

The model is trained in a similar way to how one would train an encoder-decoder Transformer (like T5 or BART) for machine translation; i.e. teacher forcing is used. At inference time, one can autoregressively generate samples, one time step at a time.

⚠️ This is a recently introduced model and modality, so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a Github Issue.

Masked Siamese Networks

The ViTMSN model was proposed in Masked Siamese Networks for Label-Efficient Learning by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas.

MSN (masked siamese networks) consists of a joint-embedding architecture to match the prototypes of masked patches with that of the unmasked patches. With this setup, the method yields excellent performance in the low-shot and extreme low-shot regimes for image classification, outperforming other self-supervised methods such as DINO. For instance, with 1% of ImageNet-1K labels, the method achieves 75.7% top-1 accuracy.

MarkupLM

The MarkupLM model was proposed in MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei.

MarkupLM is BERT, but applied to HTML pages instead of raw text documents. The model incorporates additional embedding layers to improve performance, similar to LayoutLM.

The model can be used for tasks like question answering on web pages or information extraction from web pages. It obtains state-of-the-art results on 2 important benchmarks: WebSRC and SWDE.

Security & safety

We explore a new serialization format not using Pickle that we can then leverage in the three frameworks we support: PyTorch, TensorFlow, and JAX. We leverage the safetensors library for that.

Support is for PyTorch models only at this stage, and still experimental.

Computer vision post-processing methods overhaul

The processors for computer vision have been overhauled to ensure they have consistent naming, input arguments and outputs.
⚠️ The existing methods that are superseded by the introduced methods post_process_object_detection, post_process_semantic_segmentation, post_process_instance_segmentation, post_process_panoptic_segmentation are now deprecated.

🚨 Breaking changes

The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly..

Breaking change for ViT parameter initialization

Breaking change for the top_p argument of the TopPLogitsWarper of the generate method.

Model head additions

OPT and BLOOM now have question answering heads available.

Pipelines

There is now a zero-shot object detection pipeline.

TensorFlow architectures

The GroupViT model is now available in TensorFlow.

Bugfixes and improvements

  • Fix a broken link for deepspeed ZeRO inference in the docs by @nijkah in #19001
  • [doc] debug: fix import by @stas00 in #19042
  • [bnb] Small improvements on utils by @younesbelkada in #18646
  • Update image segmentation pipeline test by @amyeroberts in #18731
  • Fix test_save_load for TFViTMAEModelTest by @ydshieh in #19040
  • Pin minimum PyTorch version for BLOOM ONNX export by @lewtun in #19046
  • Update serving signatures and make sure we actually use them by @Rocketknight1 in #19034
  • Move cache: expand error message by @sgugger in #19051
  • Fixing OPT fast tokenizer option. by @Narsil in #18753
  • Fix custom tokenizers test by @sgugger in #19052
  • Run torchdynamo tests by @ydshieh in #19056
  • [fix] Add DeformableDetrFeatureExtractor by @NielsRogge in #19140
  • fix arg name in BLOOM testing and remove unused arg document by @shijie-wu in #18843
  • Adds package and requirement spec output to version check exception by @colindean in #18702
  • fix use_cache by @younesbelkada in #19060
  • FX support for ConvNext, Wav2Vec2 and ResNet by @michaelbenayoun in #19053
  • [doc] Fix link in PreTrainedModel documentation by @tomaarsen in #19065
  • Add FP32 cast in ConvNext LayerNorm to prevent rounding errors with FP16 input by @jimypbr in #18746
  • Organize test jobs by @sgugger in #19058
  • Automatically tag CLIP repos as zero-shot-image-classification by @osanseviero in #19064
  • Fix LeViT checkpoint by @ydshieh in #19069
  • TF: tests for (de)serializable models with resized tokens by @gante in #19013
  • Add type hints for PyTorch UniSpeech, MPNet and Nystromformer by @daspartho in #19039
  • replace logger.warn by logger.warning by @fxmarty in #19068
  • Fix tokenizer load from one file by @sgugger in #19073
  • Note about developer mode by @LysandreJik in #19075
  • german autoclass by @flozi00 in #19049
  • Add tests for legacy load by url and fix bugs by @sgugger in #19078
  • Add runner availability check by @ydshieh in #19054
  • fix working dir by @ydshieh in #19101
  • Added type hints for TFConvBertModel by @kishore-s-15 in #19088
  • Added Type hints for VIT MAE by @kishore-s-15 in #19085
  • Add type hints for TF MPNet models by @kishore-s-15 in #19089
  • Added type hints to ResNetForImageClassification by @kishore-s-15 in #19084
  • added type hints by @daspartho in #19076
  • Improve vision models docs by @NielsRogge in #19103
  • correct spelling in README by @flozi00 in #19092
  • Don't warn of move if cache is empty by @sgugger in #19109
  • HPO: keep the original logic if there's only one process, pass the trial to trainer by @sywangyi in #19096
  • Add documentation of Trainer.create_model_card by @sgugger in #19110
  • Added type hints for YolosForObjectDetection by @kishore-s-15 in #19086
  • Fix the wrong schedule by @ydshieh in #19117
  • Change document question answering pipeline to always return an array by @ankrgyl in #19071
  • german processing by @flozi00 in #19121
  • Fix: update ltp word segmentation call in mlm_wwm by @xyh1756 in #19047
  • Add a missing space in a script arg documentation by @bryant1410 in #19113
  • Skip test_export_to_onnx for LongT5 if torch < 1.11 by @ydshieh in #19122
  • Fix GLUE MNLI when using max_eval_samples by @lvwerra in #18722
  • [BugFix] Fix fsdp option on shard_grad_op. by @ZHUI in #19131
  • Fix FlaxPretTrainedModel pt weights check by @mishig25 in #19133
  • suppoer deps from github by @lhoestq in #19141
  • Fix dummy creation for multi-frameworks objects by @sgugger in #19144
  • Allowing users to use the latest tokenizers release ! by @Narsil in #19139
  • Add some tests for check_dummies by @sgugger in #19146
  • Fixed typo in generation_utils.py by @nbalepur in #19145
  • Add accelerate support for ViLT by @younesbelkada in #18683
  • TF: check embeddings range by @gante in #19102
  • Reduce LR for TF MLM example test by @Rocketknight1 in #19156
  • update perf_train_cpu_many doc by @sywangyi in #19151
  • fix: ckpt paths. by @sayakpaul in #19159
  • Fix TrainingArguments documentation by @sgugger in #19162
  • fix HPO DDP GPU problem by @sywangyi in #19168
  • [WIP] Trainer supporting evaluation on multiple datasets by @timbmg in #19158
  • Add doctests to Perceiver examples by @stevenmanton in #19129
  • Add offline runners info in the Slack report by @ydshieh in #19169
  • Fix incorrect comments about atten mask for pytorch backend by @lygztq in #18728
  • Fixed type hint for pipelines/check_task by @Fei-Wang in #19150
  • Update run_clip.py by @enze5088 in #19130
  • german training, accelerate and model sharing by @flozi00 in #19171
  • Separate Push CI images from Scheduled CI by @ydshieh in #19170
  • Remove pos arg from Perceiver's Pre/Postprocessors by @aielawady in #18602
  • Use assertAlmostEqual in BloomEmbeddingTest.test_logits by @ydshieh in #19200
  • Move the model type check by @ankrgyl in #19027
  • Use repo_type instead of deprecated datasets repo IDs by @sgugger in #19202
  • Updated hf_argparser.py by @IMvision12 in #19188
  • Add warning for torchaudio <= 0.10 in MCTCTFeatureExtractor by @ydshieh in #19203
  • Fix cached_file in offline mode for cached non-existing files by @sgugger in #19206
  • Remove unused cur_len in generation_utils.py by @ekagra-ranjan in #18874
  • add wav2vec2_alignment by @arijitx in #16782
  • add doc for hyperparameter search by @sywangyi in #19192
  • Add a use_parallel_residual argument to control the residual computing way by @NinedayWang in #18695
  • translated add_new_pipeline by @nickprock in #19215
  • More tests for regression in cached non existence by @sgugger in #19216
  • Use math.pi instead of torch.pi in MaskFormer by @ydshieh in #19201
  • Added tests for yaml and json parser by @IMvision12 in #19219
  • Fix small use_cache typo in the docs by @ankrgyl in #19191
  • Generate: add warning when left padding should be used by @gante in #19067
  • Fix deprecation warning for return_all_scores by @ogabrielluiz in #19217
  • Fix doctest for TFDeiTForImageClassification by @ydshieh in #19173
  • Document and validate typical_p in generation by @mapmeld in #19128
  • Fix trainer seq2seq qa.py evaluate log and ft script by @iamtatsuki05 in #19208
  • Fix cache names in CircleCI jobs by @ydshieh in #19223
  • Move AutoClasses under Main Classes by @stevhliu in #19163
  • Focus doc around preprocessing classes by @stevhliu in #18768
  • Fix confusing working directory in Push CI by @ydshieh in #19234
  • XGLM - Fix Softmax NaNs when using FP16 by @gsarti in #18057
  • Add a getattr method, which replaces _module_getattr in torch.fx.Tracer from PyTorch 1.13+ by @michaelbenayoun in #19233
  • Fix m2m_100.mdx doc example missing labels by @Mustapha-AJEGHRIR in #19149
  • Fix opt softmax small nit by @younesbelkada in #19243
  • Use hf_raise_for_status instead of deprecated _raise_for_status by @Wauplin in #19244
  • Fix TrainingArgs argument serialization by @atturaioe in #19239
  • Fix test fetching for examples by @sgugger in #19237
  • Cast TF generate() inputs by @Rocketknight1 in #19232
  • Skip pipeline tests by @sgugger in #19248
  • Add job names in Past CI artifacts by @ydshieh in #19235
  • Update Past CI report script by @ydshieh in #19228
  • [Wav2Vec2] Fix None loss in doc examples by @rbsteinm in #19218
  • Catch HFValidationError in TrainingSummary by @ydshieh in #19252
  • Add expected output to the sample code for ViTMSNForImageClassification by @sayakpaul in #19183
  • Add stop sequence to text generation pipeline by @KMFODA in #18444
  • Add notebooks by @JingyaHuang in #19259
  • Add beautifulsoup4 to the dependency list by @ydshieh in #19253
  • Fix Encoder-Decoder testing issue about repo. names by @ydshieh in #19250
  • Fix cached lookup filepath on windows for hub by @kjerk in #19178
  • Docs - Guide to add a new TensorFlow model by @gante in #19256
  • Update no_trainer script for summarization by @divyanshugit in #19277
  • Don't automatically add bug label by @sgugger in #19302
  • Breakup export guide by @stevhliu in #19271
  • Update Protobuf dependency version to fix known vulnerability by @qthequartermasterman in #19247
  • Update README.md by @ShubhamJagtap2000 in #19309
  • [Docs] Fix link by @patrickvonplaten in #19313
  • Fix for sequence regression fit() in TF by @Rocketknight1 in #19316
  • Added Type hints for LED TF by @IMvision12 in #19315
  • Added type hints for TF: rag model by @debjit-bw in #19284
  • alter retrived to retrieved by @gouqi666 in #18863
  • ci(stale.yml): upgrade actions/setup-python to v4 by @oscard0m in #19281
  • ci(workflows): update actions/checkout to v3 by @oscard0m in #19280
  • wrap forward passes with torch.no_grad() by @daspartho in #19279
  • wrap forward passes with torch.no_grad() by @daspartho in #19278
  • wrap forward passes with torch.no_grad() by @daspartho in #19274
  • wrap forward passes with torch.no_grad() by @daspartho in #19273
  • Removing BertConfig inheritance from LayoutLMConfig by @arnaudstiegler in #19307
  • docker-build: Update actions/checkout to v3 by @Sushrut1101 in #19288
  • Clamping hidden state values to allow FP16 by @SSamDav in #19229
  • Remove interdependency from OpenAI tokenizer by @E-Aho in #19327
  • removing XLMConfig inheritance from FlaubertConfig by @D3xter1922 in #19326
  • Removed interdependency of BERT's Tokenizer in tokenization of prophetnet by @divyanshugit in #19331
  • Remove bert interdependency from clip tokenizer by @shyamsn97 in #19332
  • [WIP]remove XLMTokenizer inheritance from FlaubertTokenizer by @D3xter1922 in #19330
  • Making camembert independent from roberta, clean by @Mustapha-AJEGHRIR in #19337
  • Add sudachi and jumanpp tokenizers for bert_japanese by @r-terada in #19043
  • Frees LongformerTokenizer of the Roberta dependency by @srhrshr in #19346
  • Change BloomConfig docstring by @younesbelkada in #19336
  • Test failing test while we resolve the issue. by @sgugger in #19355
  • Call _set_save_spec() when creating TF models by @Rocketknight1 in #19321
  • correct typos in README by @paulaxisabel in #19304
  • Removes Roberta and Bert config dependencies from Longformer by @srhrshr in #19343
  • Fix gather for metrics by @muellerzr in #19360
  • Fix pipeline tests for Roberta-like tokenizers by @sgugger in #19365
  • Change link of repojacking vulnerable link by @Ilaygoldman in #19393
  • Making ConvBert Tokenizer independent from bert Tokenizer by @IMvision12 in #19347
  • Fix gather for metrics by @muellerzr in #19389
  • Added Type hints for XLM TF by @IMvision12 in #19333
  • add ONNX support for swin transformer by @bibhabasumohapatra in #19390
  • removes prophet config dependencies from xlm-prophet by @srhrshr in #19400
  • Added type hints for TF: TransfoXL by @thliang01 in #19380
  • HF <-> megatron checkpoint reshaping and conversion for GPT by @pacman100 in #19317
  • Remove unneded words from audio-related feature extractors by @osanseviero in #19405
  • edit: cast attention_mask to long in DataCollatorCTCWithPadding by @ddobokki in #19369
  • Copy BertTokenizer dependency into retribert tokenizer by @Davidy22 in #19371
  • Export TensorFlow models to ONNX with dynamic input shapes by @dwyatte in #19255
  • update attention mask handling by @ArthurZucker in #19385
  • Remove dependency of Bert from Squeezebert tokenizer by @rchan26 in #19403
  • Removed Bert and XML Dependency from Herbert by @harry7337 in #19410
  • Clip device map by @patrickvonplaten in #19409
  • Remove Dependency between Bart and LED (slow/fast) by @Infrared1029 in #19408
  • Removed Bert interdependency in tokenization_electra.py by @OtherHorizon in #19356
  • Make Camembert TF version independent from Roberta by @Mustapha-AJEGHRIR in #19364
  • Removed Bert dependency from BertGeneration code base. by @Threepointone4 in #19370
  • Rework pipeline tests by @sgugger in #19366
  • Fix ViTMSNForImageClassification doctest by @ydshieh in #19275
  • Skip BloomEmbeddingTest.test_embeddings for PyTorch < 1.10 by @ydshieh in #19261
  • remove RobertaConfig inheritance from MarkupLMConfig by @D3xter1922 in #19404
  • Backtick fixed (paragraph 68) by @kant in #19440
  • Fixed duplicated line (paragraph #83) Documentation: @sgugger by @kant in #19436
  • fix marianMT convertion to onnx by @kventinel in #19287
  • Fix typo in image-classification/README.md by @zhawe01 in #19424
  • Stop relying on huggingface_hub's private methods by @LysandreJik in #19392
  • Add onnx support for VisionEncoderDecoder by @mht-sharma in #19254
  • Remove dependency of Roberta in Blenderbot by @rchan26 in #19411
  • fix: renamed variable name by @ariG23498 in #18850
  • Fix the error message in run_t5_mlm_flax.py by @yangky11 in #19282
  • Add Italian translation for add_new_model.mdx by @Steboss89 in #18713
  • Fix momentum and epsilon values by @amyeroberts in #19454
  • Generate: corrected exponential_decay_length_penalty type hint by @ShivangMishra in #19376
  • Fix misspelled word in docstring by @Bearnardd in #19415
  • Fixed a non-working hyperlink in the README.md file by @MikailINTech in #19434
  • fix by @ydshieh in #19469
  • wrap forward passes with torch.no_grad() by @daspartho in #19439
  • wrap forward passes with torch.no_grad() by @daspartho in #19438
  • wrap forward passes with torch.no_grad() by @daspartho in #19416
  • wrap forward passes with torch.no_grad() by @daspartho in #19414
  • wrap forward passes with torch.no_grad() by @daspartho in #19413
  • wrap forward passes with torch.no_grad() by @daspartho in #19412

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @flozi00
    • german autoclass (#19049)
    • correct spelling in README (#19092)
    • german processing (#19121)
    • german training, accelerate and model sharing (#19171)
  • @DeppMeng
    • Add support for conditional detr (#18948)
  • @sayakpaul
    • MSN (Masked Siamese Networks) for ViT (#18815)
    • fix: ckpt paths. (#19159)
    • Add expected output to the sample code for ViTMSNForImageClassification (#19183)
  • @IMvision12
    • Updated hf_argparser.py (#19188)
    • Added tests for yaml and json parser (#19219)
    • Added Type hints for LED TF (#19315)
    • Making ConvBert Tokenizer independent from bert Tokenizer (#19347)
    • Added Type hints for XLM TF (#19333)
  • @ariG23498
    • [TensorFlow] Adding GroupViT (#18020)
    • fix: renamed variable name (#18850)
  • @Mustapha-AJEGHRIR
    • Fix m2m_100.mdx doc example missing labels (#19149)
    • Making camembert independent from roberta, clean (#19337)
    • Make Camembert TF version independent from Roberta (#19364)
  • @D3xter1922
    • removing XLMConfig inheritance from FlaubertConfig (#19326)
    • [WIP]remove XLMTokenizer inheritance from FlaubertTokenizer (#19330)
    • remove RobertaConfig inheritance from MarkupLMConfig (#19404)
  • @srhrshr
    • Frees LongformerTokenizer of the Roberta dependency (#19346)
    • Removes Roberta and Bert config dependencies from Longformer (#19343)
    • removes prophet config dependencies from xlm-prophet (#19400)
  • @sahamrit
    • [WIP] Add ZeroShotObjectDetectionPipeline (#18445) (#18930)
  • @Davidy22
    • Copy BertTokenizer dependency into retribert tokenizer (#19371)
  • @rchan26
    • Remove dependency of Bert from Squeezebert tokenizer (#19403)
    • Remove dependency of Roberta in Blenderbot (#19411)
  • @harry7337
    • Removed Bert and XML Dependency from Herbert (#19410)
  • @Infrared1029
    • Remove Dependency between Bart and LED (slow/fast) (#19408)
  • @Steboss89
    • Add Italian translation for add_new_model.mdx (#18713)

Don't miss a new transformers release

NewReleases is sending notifications on new releases.