pypi transformers 4.27.0
BridgeTower, Whisper speedup, DETA, SpeechT5, BLIP-2, CLAP, ALIGN, API updates

latest releases: 4.44.2, 4.44.1, 4.44.0...
18 months ago

BridgeTower

The goal of this model is to build a bridge between each uni-modal encoder and the cross-modal encoder to enable comprehensive and detailed interaction at each layer of the cross-modal encoder thus achieving remarkable performance on various downstream tasks with almost negligible additional performance and computational costs.

Whisper speedup

The Whisper model was integrated a few releases ago. This release offers significant performance optimizations when generating with timestamps. This was made possible by rewriting the generate() function of Whisper, which now uses the generation_config and implementing a batched timestamp prediction. The language and task can now also be setup when calling generate(). For more details about this refactoring checkout this colab.
Notably, whisper is also now supported in Flax 🚀 thanks to @andyehrenberg ! More whisper related commits:

DETA

DETA (short for Detection Transformers with Assignment) improves Deformable DETR by replacing the one-to-one bipartite Hungarian matching loss with one-to-many label assignments used in traditional detectors with non-maximum suppression (NMS). This leads to significant gains of up to 2.5 mAP.

SpeechT5

The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder.

XLM-V

XLM-V is multilingual language model with a one million token vocabulary trained on 2.5TB of data from Common Crawl (same as XLM-R).

BLIP-2

BLIP-2 leverages frozen pre-trained image encoders and large language models (LLMs) by training a lightweight, 12-layer Transformer encoder in between them, achieving state-of-the-art performance on various vision-language tasks. Most notably, BLIP-2 improves upon Flamingo, an 80 billion parameter model, by 8.7% on zero-shot VQAv2 with 54x fewer trainable parameters.

X-MOD

X-MOD extends multilingual masked language models like XLM-R to include language-specific modular components (language adapters) during pre-training. For fine-tuning, the language adapters in each transformer layer are frozen.

Ernie-M

ERNIE-M is a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance.

TVLT

The Textless Vision-Language Transformer (TVLT) is a model that uses raw visual and audio inputs for vision-and-language representation learning, without using text-specific modules such as tokenization or automatic speech recognition (ASR). It can perform various audiovisual and vision-language tasks like retrieval, question answering, etc.

CLAP

CLAP (Contrastive Language-Audio Pretraining) is a neural network trained on a variety of (audio, text) pairs. It can be instructed in to predict the most relevant text snippet, given an audio, without directly optimizing for the task. The CLAP model uses a SWINTransformer to get audio features from a log-Mel spectrogram input, and a RoBERTa model to get text features. Both the text and audio features are then projected to a latent space with identical dimension. The dot product between the projected audio and text features is then used as a similar score.

GPTSAN

GPTSAN is a Japanese language model using Switch Transformer. It has the same structure as the model introduced as Prefix LM in the T5 paper, and support both Text Generation and Masked Language Modeling tasks. These basic tasks similarly can fine-tune for translation or summarization.

EfficientNet

EfficientNets are a family of image classification models, which achieve state-of-the-art accuracy, yet being an order-of-magnitude smaller and faster than previous models.

ALIGN

ALIGN is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image classification. ALIGN features a dual-encoder architecture with EfficientNet as its vision encoder and BERT as its text encoder, and learns to align visual and text representations with contrastive learning. Unlike previous work, ALIGN leverages a massive noisy dataset and shows that the scale of the corpus can be used to achieve SOTA representations with a simple recipe.

Informer

Informer is a method to be applied to long-sequence time-series forecasting. This method introduces a Probabilistic Attention mechanism to select the “active” queries rather than the “lazy” queries and provides a sparse Transformer thus mitigating the quadratic compute and memory requirements of vanilla attention.

API updates and improvements

Safetensors

safetensors is a safe format of serialization of tensors, which has been supported in transformers as a first-class citizen for the past few versions.

This change enables explicitly forcing the from_pretrained method to use or not to use safetensors. This unlocks a few use-cases, notably the possibility to enforce loading only from this format, limiting security risks.

Example of usage:

from transformers import AutoModel

# As of version v4.27.0, this loads the `pytorch_model.bin` by default if `safetensors` is not installed.
# It loads the `model.safetensors` file if `safetensors` is installed.
model = AutoModel.from_pretrained('bert-base-cased')

# This forces the load from the `model.safetensors` file.
model = AutoModel.from_pretrained('bert-base-cased', use_safetensors=True)

# This forces the load from the `pytorch_model.bin` file.
model = AutoModel.from_pretrained('bert-base-cased', use_safetensors=False)

Variant

This PR adds a "variant" keyword argument to PyTorch's from_pretrained and save_pretrained so that multiple weight variants can be saved in the model repo.

Example of usage with the model hosted in this folder on the Hub:

from transformers import CLIPTextModel

path = "huggingface/the-no-branch-repo"  # or ./text_encoder if local

# Loads the `no_ema` variant. This loads the `pytorch_model.fp16.bin` file from this folder.
model = CLIPTextModel.from_pretrained(path, subfolder="text_encoder", variant="fp16")

# This loads the no-variant checkpoint, loading the `pytorch_model.bin` file from this folder.
model = CLIPTextModel.from_pretrained(path, subfolder="text_encoder")

bitsandbytes

The bitsandbytes integration is overhauled, now offering a new configuration: the BytsandbytesConfig.

Read more about it in the documentation.

FSDP

This PR enables the user to make use of the PyTorch/XLA implementation of FSDP, including the newly added auto-wrap feature. Four arguments have been added to training_args.py to facilitate this functionality:

  • xla_fsdp: this flag is a string containing the location of a .json file which specifies the FSDP arguments the user wants to use when wrapping their model.
  • xla_fsdp_min_num_params: this flag is an int which will set a size-based automatic wrapping policy which automatically FSDP wraps any module with at least xla_fsdp_min_num_params many parameters.
  • xla_fsdp_transformer_layer_cls_to_wrap: this flag is a list of (case-sensitive) strings which will set a layer-class-based automatic wrapping policy which automatically FSDP wraps any module whose name matches one of the listed strings.
  • xla_fsdp_grad_ckpt: this flag is a bool which determines whether gradient checkpointing is enabled for the automatically wrapped layers.

Breaking changes

Generate

This PR standardizes beam search behavior across all three frameworks through early_stopping. PyTorch is unchanged, but TensorFlow and Flax users will see a significant speedup if they keep the default generation parameters.

There are, however, minor differences in outputs of the .generate method with beam search on TensorFlow and Flax. It should be very small and will come with significant speedups, but in case it breaks your workflow, we recommend you downgrade to a previous version and let us know in a GitHub issue so that we may investigate what is going on.

  • 🚨🚨 Generate: standardize beam search behavior across frameworks by @gante in #21368

Single model initialization

Model initialization has problems which led to the initialization being incoherent across models and across initialization techniques. This is technically a bugfix, but as it may result in your models being initialized with different values, we think it best to highlight it here.

  • 🚨🚨🚨 Enforce single model initialization by @sgugger in #21431

Deprecations

This PR deprecated the parallelize API which has been replaced by accelerate months ago. We recommend loading the model using the device_map attribute and setting it to balanced to obtain the previous behavior.

Setting your own device_map is still permitted, but it needs to be a dictionary from module name to device, for example:

device_map = {'h.0': 0, 'h.1': 1, ...}

Pipelines

A new pipeline focused on zero-shot audio classification is added to the repository.

Documentation

The task and model summaries have been refactored to take into account the larger number of tasks and models we now have.

Bugfixes and improvements

  • [t5] Fix T5 inference in float16 + bnb error by @younesbelkada in #21281
  • [examples/deepspeed] fix renamed api by @stas00 in #21283
  • [GenerationConfig] add additional kwargs handling by @ArthurZucker in #21269
  • [W2V2 with LM] Fix decoder test with params by @sanchit-gandhi in #21277
  • Fix TrainingArguments.label_names docs to reflect the correct default value behaviour by @fredtcaroli in #21288
  • Update expected values for doctest by @stevhliu in #21284
  • [GIT] Add test for batched generation by @NielsRogge in #21282
  • Supporting ImageProcessor in place of FeatureExtractor for pipelines by @Narsil in #20851
  • [Mask2Former] Add doc tests by @NielsRogge in #21232
  • Moving to cleaner tokenizer version or oneformer. by @Narsil in #21292
  • Fix EfficientFormer by @ydshieh in #21294
  • [Hubert] Fix Hubert processing auto by @younesbelkada in #21299
  • Update OneFormerModelIntegrationTest expected values by @ydshieh in #21295
  • [Doctest] Fix Blenderbot doctest by @younesbelkada in #21297
  • Documentation code sample fixes by @MKhalusova in #21302
  • [CI-Daily] replace past in prepare inputs for generation by @ArthurZucker in #21296
  • Small fix to ExponentialDecayLengthPenalty docstring by @njhill in #21308
  • Accept batched tensor of images as input to image processor by @amyeroberts in #21144
  • Use model_class.__name__ and compare against XXX_MAPPING_NAMES by @ydshieh in #21304
  • Fix 2 paths in the doctest list by @ydshieh in #21314
  • [i18n-KO] Translated quicktour page to Korean by @wonhyeongseo in #20946
  • Small QoL for qa. by @Narsil in #21316
  • check paths in utils/documentation_tests.txt by @ydshieh in #21315
  • Fix TFEncoderDecoder tests by @ydshieh in #21301
  • Generate: better compute_transition_scores examples by @gante in #21323
  • [Doctest] Fix Perceiver doctest by @younesbelkada in #21318
  • Update Hebrew language code to he per IANA registry by @altryne in #21310
  • Fix M2M100 positional embedding creation for ONNX by @michaelbenayoun in #21328
  • Fix RobertaPreLayerNorm doctest by @ydshieh in #21337
  • Little cleanup: let huggingface_hub manage token retrieval by @Wauplin in #21333
  • Automated compatible models list for task guides by @MKhalusova in #21338
  • Fix GitModelIntegrationTest.test_batched_generation device issue by @ydshieh in #21362
  • Pipeline testing - using tiny models on Hub by @ydshieh in #20426
  • fix the issue that the output dict of jit model could not get [0] by @sywangyi in #21354
  • Corrected by @HsiangNianian in #21350
  • Remove duplicate declarations in dummy inputs for TFLongformer by @peakji in #21352
  • Fix DETR tests after #21144 by @amyeroberts in #21365
  • Add cPython files in build by @sgugger in #21372
  • Generate: Relaxed max_length and max_new_tokens coexistence by @gante in #21347
  • Fixes path for Graphormer checkpoint by @clefourrier in #21367
  • Adding resource section to GPT-J docs by @adit299 in #21270
  • translate index to zh by @bfss in #20095)
  • [run_(clm|mlm).py examples] add streaming dataset support by @stas00 in #21343
  • Template for framework-agnostic tests by @gante in #21348
  • Cleanup the usage of layer_norm_eps in some models by @ydshieh in #21336
  • Do not log the generation config for each prediction step in TrainerSeq2Seq by @regisss in #21385
  • [Docs] Minor fixes by @NielsRogge in #21383
  • Simplify column_names in run_clm/mlm by @lhoestq in #21382
  • Add support of backward_prefetch and forward_prefetch by @raghavanone in #21237
  • Remove more unused attributes in config classes by @ydshieh in #21327
  • Generate: fix TF XLA tests on models with max_position_embeddings or max_target_positions by @gante in #21389
  • Update Graphormer and fix its torchscript test failures by @ydshieh in #21380
  • Moved LiLT under multimodal models in TOC by @MKhalusova in #21393
  • Fix the issue of using only inputs_embeds in convbert model by @raghavanone in #21398
  • Skip batches fast with accelerate by @sgugger in #21390
  • Added DagshubCallback by @jinensetpal in #21404
  • Add TF image classification example script by @amyeroberts in #19956
  • Generate: decoder-only models can generate with inputs_embeds by @gante in #21405
  • Use torch 1.13.1 in push/schedule CI by @ydshieh in #21421
  • Fix image_processor_class bug by @shikhartuli in #21410
  • Add distinct section names for PyTorch and TF by @Rocketknight1 in #21422
  • Add the GeLU activation from pytorch with the tanh approximation by @jlamypoirier in #21345
  • Fix Graphormer test suite by @clefourrier in #21419
  • [bnb] Fine-tuning HF 8-bit models by @younesbelkada in #21290
  • Allow to add more information in is_flaky by @ydshieh in #21426
  • Fix some pipeline tests by @ydshieh in #21401
  • Fix task guide formatting by @stevhliu in #21409
  • Fixes bug in the creation of ExponentialDecayLengthPenalty by @jorgemcgomes in #21423
  • Add inputs_embeds support for .generate() with BLOOM models by @akreal in #21430
  • Remove more unused attributes in config classes by @ydshieh in #21392
  • Added model resources for LayoutLM Issue#19848 by @avisinghal6 in #21377
  • Fix device issue in a ConvBertModelTest test by @ydshieh in #21438
  • do not scale gradient in bf16 mode by @kashif in #21428
  • exclude deleted files in the fixup script by @dtuit in #21436
  • Add tutorial doc for TF + TPU by @Rocketknight1 in #21429
  • For IterableDataset, return DataLoader using self._train_batch_size. … by @agossard in #21447
  • Avoid flaky generation sampling tests by @ydshieh in #21445
  • Fix SpeechT5ForSpeechToSpeechIntegrationTests device issue by @ydshieh in #21460
  • Add perf numbers for perf_train_cpu by @jianan-gu in #20974
  • Added documentation for DagsHubCallback by @jinensetpal in #21452
  • Fix PushToHubCallback import in Share a model docs by @ireneisdoomed in #21457
  • Add VQGAN-CLIP research project by @ErwannMillon in #21329
  • Fixed RAG script which was failing on dummy example by @kaustubhdhole in #21416
  • make SpeechT5 doc examples deterministic by @hollance in #21470
  • Generate: TF can now accept custom logits processors by @gante in #21454
  • Removing more_itertools dependency. by @Narsil in #21473
  • [examples] improve block_size warning message by @stas00 in #21463
  • [i18n-fr] Translate index page to French by @NoB0 in #21458
  • OPT: BLIP2-ready prepare_inputs_for_generation by @gante in #21477
  • Add tips for generation with Int8 models by @lewtun in #21424
  • Update quality tooling for formatting by @sgugger in #21480
  • Fix epoch number when resuming training by @sgugger in #21478
  • [CI ] Remove past in favor of pat_key_values by @ArthurZucker in #21443
  • Generate: TF can now generate from embeddings in encoder-decoder models by @gante in #21475
  • [Doc] Fix int8 docs by @younesbelkada in #21487
  • changed "ot" to "to" by @Iulian277 in #21488
  • 🖊️ fix typo in pytorch semantic segmentation readme by @jvdd in #21492
  • Typos/fixes to link syntax by @Rocketknight1 in #21450
  • Sanity check the type of id2label and label2id arguments of from_pretrained for TokenClassification models by @raghavanone in #21490
  • [OPT] Adds GPT2TokenizerFast to the list of tokenizer to use for OPT. by @ArthurZucker in #20823
  • A new test to check config attributes being used by @ydshieh in #21453
  • Add limit_all_gathers option to fsdp_config and fix forward_prefetch bug by @raghavanone in #21489
  • Cleanup quality by @sgugger in #21493
  • [tokenizer] sanitize saved config by @stas00 in #21483
  • Add inverse sqrt learning rate scheduler by @Sager611 in #21495
  • Check for mapping/dict in distributed_concat function by @prajwal967 in #21500
  • Fix import in Accelerate for find_exec_bs by @sgugger in #21501
  • Wrap RemBert integration test forward passes with torch.no_grad() by @katiele47 in #21503
  • Exclude the madeup words from M2M100Tokenizer.vocab_size by @guillaumekln in #20976
  • [Doc] Minor URL fixes in PyTorch Text Classification Readme by @stefan-it in #21511
  • Generate: TF compute_transition_scores by @gante in #21341
  • no more dummies for speech processors by @hollance in #21517
  • Update OPT conversion script to work for OPT-IML by @thomasw21 in #21519
  • [tests] add missing report_to none by @stas00 in #21505
  • Fixing backward compatiblity image_processor in pipeline. by @Narsil in #21513
  • Fix multiple eos_token_ids in model.generate(...) by @tokestermw in #21461
  • Add __len__ method to _LazyAutoMapping by @ydshieh in #21522
  • Generate: make TF .generate() signature == PT .generate() signature by @gante in #21525
  • Generate: TF .generate() can now be exported with dynamic length by @gante in #21474
  • Fix missing unfinished_sequences by @tokestermw in #21529
  • Fix ClearML Integration to run in ClearML pipelines and external Tasks. by @thepycoder in #21531
  • Tag tests as slow ⌛ by @gante in #21537
  • fix typo in run_speech_recognition_ctc.py by @21jun in #21528
  • Fix inclusion of non py files in package by @sgugger in #21546
  • Fix from_pretrained API with config and state_dict by @sgugger in #21542
  • Added with torch.no_grad() to XLM-Roberta integration test by @katiele47 in #21547
  • [pipeline] A simple fix for half-precision & 8bit models by @younesbelkada in #21479
  • Added with torch.no_grad() to Camembert integration test by @katiele47 in #21544
  • adding a tip for deepspeed integration in multi-node environment by @izapolsk in #21459
  • Fix stuff related to the causal_mask in CodeGen. by @GeneZC in #21527
  • Replace inefficient torch.sqrt taking scalar input with numpy.sqrt by @FindHao in #21496
  • Add _mp_fn to run_mae.py for XLA testing by @steventk-g in #21551
  • [Tests] Improve flax test_attention_outputs by @Shubhamai in #21486
  • [from_pretrained] extend torch_dtype="auto" to look up config.torch_dtype first, expand docs by @stas00 in #21524
  • [Tasks] Adds image captioning by @sayakpaul in #21512
  • Goodbye to Blip-2 doctests by @ydshieh in #21566
  • [deepspeed] deal with models w/o config.hidden_size by @stas00 in #21504
  • improving contributing tests section by @Shubhamai in #21569
  • Replace input_values_processing with unpack_inputs by @amyeroberts in #21502
  • Added timesformer configuration by @AdiaWu in #21446
  • Remove more unused attributes in config classes by @ydshieh in #21543
  • [Blip2] Add int8 support for blip2-flan-t5-xxl by @younesbelkada in #21574
  • Generate: TF supports multiple eos tokens by @gante in #21571
  • Add: document question answering task guide by @MKhalusova in #21518
  • CI: skip failing TF hubert test by @gante in #21601
  • Remove trailing 'extractive' word from en documentation by @tpaviot in #21594
  • [MINOR] Fix link in timeseries transformer docs by @cakiki in #21602
  • Add inputs_embeds support when generating with GPT-J by @dimitry12 in #21575
  • Generate: Fix flaky indexing error in test_constrained_beam_search_generate_dict_output by @gante in #21561
  • [bnb] Let's make the daily CI green 🍏 by @younesbelkada in #21597
  • annotated TFvisionEncoderDecoder input type hints by @miyu386 in #21432
  • Correct Markdown bullets indentation by @wangkuiyi in #21583
  • Add missing arguemtn to run_clip.py by @WarrenGreen in #21588
  • Fix Blip-2 CI by @ydshieh in #21595
  • Generate: correct default model input creation for decoder-only models by @gante in #21580
  • [i18n-fr] Translate quicktour page to French by @NoB0 in #21589
  • Update setup.py by @stas00 in #21584
  • [deepspeed] performance docs by @stas00 in #21573
  • Clarify available pipelines in quicktour by @stevhliu in #21607
  • Fix env. variable type issue in testing by @ydshieh in #21609
  • Fix TF CTC tests by @gante in #21606
  • Add in big model inference to issue template by @muellerzr in #21611
  • Enable requires_grad on input embedding to train on top of frozen layers by @younesbelkada in #21598
  • Generate: filter encoder inputs when its signature does not accept wildcards by @gante in #21603
  • Generate: input expansion for any model input by @gante in #21624
  • Final cleanup of TOKENIZER_FOR_DOC by @sgugger in #21565
  • Remove Niels from templates by @sgugger in #21564
  • Fix generation config for empty state dict by @sgugger in #21630
  • Removes duplicate computations in DETR post processing by @eclique in #21592
  • Fix typo in documentation. by @mmcdermott in #21632
  • Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) by @BenoitDalFerro in #21627
  • Fix typo in QA task guide by @stevhliu in #21608
  • fix: Race Condition when using Sagemaker Checkpointing and Model Repository by @DougTrajano in #21614
  • Remove extra "max_length is reached." from InfNaNLogitsProcessor documentation by @mmcdermott in #21634
  • Fix Blip-2 CI again by @ydshieh in #21637
  • Skip wav2vec2 hubert high mem tests by @amyeroberts in #21643
  • Fix passing kwargs to TFBertTokenizer by @balvisio in #21619
  • Skipping more high mem tests - Wav2Vec2 Hubert by @amyeroberts in #21647
  • Pass parent exception as context exception to provide clearer stack trace by @balvisio in #21636
  • Generate: PT Dynamo without graph breaks in the main greedy/sample loop by @gante in #21648
  • Update deprecated load_module by @sgugger in #21651
  • Fix typos in contrastive-image-text example README by @regisss in #21665
  • [WIP] Move X-MOD models to facebook organization by @jvamvas in #21640
  • refactor: Make direct_transformers_import util by @connor-henderson in #21652
  • [bloom] gradient_checkpointing fix by @stas00 in #21655
  • Add OPT resources to the transformers documentation by @alissadb in #21625
  • Adapt PerceiverIO Multimodal class to work with arbitrary modalities by @stevenmanton in #20054
  • Fix multi-gpu training error for LayoutLMv2 by @akkikiki in #21675
  • Generate: eta sampling numerical stability by @gante in #21676
  • [ImageProcessor] Refactor default mean & std to OPENAI_CLIP_MEAN & OPENAI_CLIP_STD by @younesbelkada in #21425
  • [BLIP] update blip path on slow tests by @younesbelkada in #21476
  • Fix dynamic module import error by @ydshieh in #21646
  • Fix for non-contiguous label tensors in VisonEncoderDecoder by @morganmcg1 in #21582
  • Fix-rag-finetune-project-requirement by @ArthurZucker in #21697
  • Pass along revision in dynamic code fetch by @sgugger in #21698
  • Fix axial positional encoding calculations for reformer.mdx by @ijindal in #21649
  • remove position ids and token type ids from forward args in docstring by @ArthurZucker in #21701
  • Fix typo in PROCESSOR_MAPPING_NAMES and add tests by @ydshieh in #21703
  • Fix get_class_in_module by @ydshieh in #21709
  • Fix TVLT (torch device issue) by @ydshieh in #21710
  • Adding task guides to resources by @MKhalusova in #21704
  • Adding type hints to call() functions in this file by @mollerup23 in #21548
  • Time series transformer: input projection and Std scaler by @kashif in #21020
  • Apply ruff flake8-comprehensions by @Skylion007 in #21694
  • [MBart] Fix cross attention mask check by @younesbelkada in #21730
  • Respect documentation on passive log level by @sgugger in #21700
  • Remove gptsan_japanese from doctest list to avoid GPU OOM by @ydshieh in #21722
  • Change doc example for BigBirdForQuestionAnswering by @ydshieh in #21723
  • Fix ErnieMEmbeddings device issue by @ydshieh in #21726
  • Fix GPTSanJapaneseModel by @ydshieh in #21731
  • [SpeechT5HifiGan] Handle batched inputs by @sanchit-gandhi in #21702
  • Fix to KerasMetricCallback when the model returns unstructured output by @Rocketknight1 in #21727
  • Added "Open in Colab" to task guides by @MKhalusova in #21729
  • typos in french documentation by @tpaviot in #21750
  • Make ImageProcessorMixin compatible with subfolder kwarg by @Abhinay1997 in #21725
  • Update doctest GH workflow file by @ydshieh in #21744
  • Fix 2 quicktour file doctest by @ydshieh in #21742
  • [GPTNeo] Fix gradient checkpointing bug by @younesbelkada in #21733
  • Generate: Fix GIT batched captioning by @gante in #21738
  • Added Type Hints for modeling_tf_encoder_decoder.py by @Batese2001 in #21673
  • Auto api Value Error addition to Troubleshoot by @MKhalusova in #21708
  • [deepspeed tests] fix issues introduced by #21700 by @stas00 in #21769
  • Graphormer fix by @clefourrier in #21699
  • fix: Change is_last chunk calc and add conditional break in chunk_iter by @connor-henderson in #21612
  • [Flax] adding support for batch norm layers by @Shubhamai in #21581
  • [Examples] Generalise run audio classification for log-mel models by @sanchit-gandhi in #21756
  • Different behavior in DistilBERT when using "inputs_embeds" by @ArthurZucker in #21752
  • [Flax] Fix erroneous kwargs being passed to generate config by @sanchit-gandhi in #21765
  • Generate - update cookie cutters to not initialize cache with training and gradient checkpointing by @gante in #21759
  • [time series] updated expected values for integration test. by @kashif in #21762
  • [GPT2, ProphetNet] Fix gradient checkpointing bug by @yhl48 in #21772
  • [SpeechT5] Fix HiFiGAN tests by @sanchit-gandhi in #21788
  • Fix resume_from_checkpoint for deepspeed by @mosheber in #21735
  • [examples/summarization] deal with max_length and num_beams by @bofenghuang in #21740
  • Fix type in gpt2 config docstring by @WeberJulian in #21782
  • Fix en documentation typos by @tpaviot in #21799
  • [FX tracer] Make concrete_args from outside available by @lygztq in #21775
  • [torch] remove deprecated uint8 in favor of bool by @ArthurZucker in #21384
  • [tests] add accelerate marker by @younesbelkada in #21743
  • Fix PyTorch Perceiver PerceiverFourierPositionEncoding with fp16 by @fxmarty in #21787
  • Fix nn.init.trunc_normal_ call on torch.float16 data by @fxmarty in #21789
  • Fix gradient checkpointing bug in gptneox by @KMFODA in #21815
  • Inheritance-based framework detection by @gante in #21784
  • Fix quality with ruff==0.0.253 by @ydshieh in #21828
  • introduce logger.warning_once and use it for grad checkpointing code by @stas00 in #21804
  • Rename MobileViTModelTest to TFMobileViTModelTest by @ydshieh in #21825
  • Fix gradient checkpointing bug BioGpt by @saswatmeher in #21844
  • check for None forced tokens by @andyehrenberg in #21793
  • Fix gradient checkpointing bug in git by @KMFODA in #21818
  • Fix gradient checkpointing imagegpt by @KMFODA in #21816
  • Fix tf random token masking probability in data collator by @anruijian in #21834
  • [T5] Fix torchquant issue by @younesbelkada in #21843
  • [Blip2] Add Blip2Model by @younesbelkada in #21817
  • Fix the issue of blip model returning loss even when the label is not provided. by @raghavanone in #21811
  • [GPTJ] Fix gradient checkpointing bug by @krypticmouse in #21794
  • Add: task guide for zero shot object detection by @MKhalusova in #21829
  • Make Slack CI reporting stronger by @ydshieh in #21823
  • [Blip2] Fix Blip-2 multi gpu by @younesbelkada in #21707
  • 🔥Rework pipeline testing by removing PipelineTestCaseMeta 🚀 by @ydshieh in #21516
  • Improve TF weight loading, especially PT crossloading by @Rocketknight1 in #21792
  • Fix flaky test for log level by @sgugger in #21776
  • prepare for "floordiv is deprecated and its behavior will change in a future version of pytorch" by @ArthurZucker in #20211
  • [ConvBert] Fix #21523 by @ArthurZucker in #21849
  • Flax beam search fix by @andyehrenberg in #21857
  • Fix gradient checkpointing bug Bart by @saswatmeher in #21866
  • [deepspeed] check whether model is NLP one instead of counting on input type by @izapolsk in #21800
  • Change the way tensor is reshaped in BartAttention (from .view to .reshape) by @raghavanone in #21860
  • Italian translation of community.mdx by @lorenzobalzani in #21871
  • [Blip] Fix blip doctest by @younesbelkada in #21868
  • Removed BLIP mention from the troubleshooting guide by @MKhalusova in #21872
  • update FSDP and add XLA-FSDP documentation by @pacman100 in #21812
  • [doc] deepspeed tests by @stas00 in #21859
  • Add an utility file to get information from test files by @ydshieh in #21856
  • Add check for different embedding types in examples by @Rocketknight1 in #21881
  • Make loading of pretrained gpt2 faster by avoiding initialization of Conv1D's weights by @twaka in #21879
  • Fix Gradient checkpointing bug BigBird by @saswatmeher in #21882
  • Fix test_load_default_pipelines_pt for ClapModel by @ydshieh in #21886
  • fix checkpoint by @ArthurZucker in #21874
  • [Refactor] Relative imports wherever we can by @ArthurZucker in #21880
  • [ZAC] fix ci daily by @ArthurZucker in #21893
  • Use PyAV instead of Decord in examples by @amyeroberts in #21572
  • Add inputs_embeds functionality when generating with BioGPT by @sidkiblawi in #21889
  • [T5 doc] Fix confusing documentation about d_kv by @ArthurZucker in #21896
  • fix typo in Bart's attention by @kashif in #21898
  • [GPT-J] add deprecation warning by @ArthurZucker in #21869
  • fsdp bf16 enable autocast by @pacman100 in #21847
  • Fix gradient checkpointing bug LED by @KMFODA in #21840
  • Fix gradient checkpointing bug M2M 100 by @KMFODA in #21841
  • Fix gradient checkpointing bug marian by @KMFODA in #21842
  • Mark pipeline tests to skip them easily by @sgugger in #21887
  • Clean up auto mapping names by @ydshieh in #21903
  • Prophetnet batch dimension inversion fix by @kiansierra in #21870
  • Make schedulers picklable by making lr_lambda fns global by @connor-henderson in #21768
  • Add Blip and Blip2 for pipeline tests by @ydshieh in #21904
  • Temporarily skip 3 tests in BridgeTowerModelTest by @ydshieh in #21908
  • Faster zero shot image by @Narsil in #21897
  • [time series] Add Time series inputs tests by @kashif in #21846
  • Avoid modeling tests run in pipeline CI jobs by @ydshieh in #21911
  • Fix doctests for TFVisionTextDualEncoder by @Rocketknight1 in #21910
  • faster forward following what is done for images by @ArthurZucker in #21906
  • Fix gradient checkpointing bug in MBart by @KMFODA in #21918
  • Fix gradient checkpointing bug in mvp by @KMFODA in #21920
  • Fix gradient checkpointing megatron bert by @KMFODA in #21921
  • Use large VM for repo_utils_job by @ydshieh in #21928
  • Cleanup more auto mapping names by @ydshieh in #21909
  • feat: filter try/except when looking at custom code by @zanussbaum in #21914
  • Fix AlignModelTest tests by @ydshieh in #21923
  • Avoid failure in check_repo.py due to missing backends by @ydshieh in #21930
  • Fix wrong documentation about DataCollator padding defaults by @substanc3-dev in #21919
  • [Flan-UL2] Add-flan-ul2 by @ArthurZucker in #21929
  • Update README logo by @gary149 in #21933
  • [CLAP] Support batched inputs for CLAP. Fixes pipeline issues by @ArthurZucker in #21931
  • Fix gradient checkpointing bug in OPT by @KMFODA in #21943
  • Fix gradient checkpointing bug in Pegasus by @KMFODA in #21944
  • Fix gradient checkpointing bug in Rembert by @KMFODA in #21945
  • Fix gradient checkpointing bug in Roformer by @KMFODA in #21946
  • Fixed gradient_checkpointing/use_cache bug in blenderbot by @Batese2001 in #21833
  • Update expected values in XLMProphetNetModelIntegrationTest by @ydshieh in #21957
  • [CI] Fix ci by @ArthurZucker in #21940
  • Disable DDP for neuron by @aws-sangeetha in #21953
  • Fix bert issue by @saswatmeher in #21963
  • [Generate] Fix gradient_checkpointing and use_cache bug for BLOOM by @asrimanth in #21956
  • Add missing parameter definition in layoutlm config by @Atomnp in #21960
  • Use larger atol in torch.allclose for some tests by @ydshieh in #21966
  • Add TF contrastive image text finetuning example by @Rocketknight1 in #21939
  • Update expected values for test_xglm_sample by @ydshieh in #21975
  • Fix gradient checkpointing bug in BigBird Pegasus by @KMFODA in #21976
  • Fix gradient checkpointing bug in Blenderbot Small by @KMFODA in #21977
  • Fix gradient checkpointing bug in BlipText by @KMFODA in #21978
  • Fix gradient checkpointing bug in Codegen by @KMFODA in #21979
  • Fix gradient checkpointing bug in ESM by @KMFODA in #21980
  • docs: improve clarity for language modeling by @pdhall99 in #21952
  • Update Jukebox tests by @ydshieh in #21984
  • Add check before int casting for PIL conversion by @amyeroberts in #21969
  • Fix MinNewTokensLengthLogitsProcessor when used with a list of eos tokens by @eladsegal in #21959
  • [DETR, YOLOS] Fix device bug by @NielsRogge in #21974
  • Remove unneeded casts to bool by @regisss in #21983
  • Update notification_service.py by @ydshieh in #21992
  • Skip test_multi_gpu_data_parallel_forward for some model tests by @ydshieh in #21991
  • Stop requiring Torch for our TF examples! by @Rocketknight1 in #21997
  • [TF] Fix creating a PR while pushing in TF framework by @ArthurZucker in #21968
  • [DETR and friends] Remove is_timm_available by @NielsRogge in #21814
  • Update tiny model creation script and some others files by @ydshieh in #22006
  • Generate - add 1 to cur_len to make up the new beam length by @jimmieliu in #21993
  • VideoMAE doctest - use valid dummy pixel values by @amyeroberts in #22022
  • update: bertology paper by @QiushiSun in #22012
  • Update AudioClassificationPipelineTests::test_small_model_pt for PT 2.0.0 by @ydshieh in #22023
  • [bnb] Fix bnb error message by @younesbelkada in #22026
  • Fix test for torchneuroncore in Trainer by @sgugger in #22028
  • Add tokenize_kwargs parameter definition in the FeatureExtractionPipeline by @anruijian in #22031
  • [examples/speech-recognition] Add SpecAugment to run_speech_recognition_seq2seq.py by @bofenghuang in #21942
  • Avoid text_config_dict and vision_config_dict being saved for CLIP-like models by @ydshieh in #22035
  • Mark all BridgeTower tests slow for now by @ydshieh in #22039
  • Bug fix: token classification pipeline while passing offset_mapping by @cceyda in #22034
  • Update ALIGN docs by @alaradirik in #22025
  • [21737][T5]: Fix gradient checkpoint bug by @nipunjindal in #22036
  • Docs Improvement - In ZSH, not using ' ' around pip install fails, fix it by @shaun-scale in #22045
  • Can't install tf2 on M1 Chip by default by @shaun-scale in #22046
  • Remove set_access_token usage + fail tests if FutureWarning by @Wauplin in #22051
  • Show the number of huggingface_hub warnings in CI report by @ydshieh in #22054
  • Return analysis for hyperparameter_search with Ray backend by @anruijian in #22040
  • pt-to-tf model architecture override by @Rocketknight1 in #22055
  • rm $ symbol from code block from contributing.md by @kamalkraj in #22057
  • [deepspeed] offload + non-cpuadam optimizer exception by @stas00 in #22043
  • Edit the docstring of image_processing_donut to match code by @vermouthmjl in #22033
  • Add setters by type of args to TrainingArguments by @sgugger in #21570
  • Update tiny model creation script by @ydshieh in #22058
  • Fix case when using --gradient_accumulation_steps with DDP disabled. by @aws-sangeetha in #22007
  • Add a progress bar for the total download of shards by @sgugger in #22062
  • Fix gradient checkpointing bug in Speech2Text by @KMFODA in #22079
  • Fix gradient checkpointing bug in switch transformer by @KMFODA in #22081
  • [GPT2] Propose fix for #21080 by @ArthurZucker in #21853
  • Fix small typo in flan-ul2.mdx by @kevin51jiang in #22068
  • Generate - Fix broken documentation links by @gante in #22078
  • Fix gradient checkpointing bug in Speecht5 by @KMFODA in #22080
  • Fix hint in src/transformers/modeling_utils.py by @J-shang in #22074
  • handle numpy inputs in whole word mask data collator by @dwyatte in #22032
  • GPT-J specific half precision on CPU note by @MKhalusova in #22086
  • Fix imports of TF MobileViT by @sgugger in #22065
  • Revert "[GPT2] Propose fix for #21080" by @ydshieh in #22093
  • Add AutoModelForZeroShotImageClassification by @alaradirik in #22087
  • add new model of MGP-STR by @wdp-007 in #21418
  • Add pr_checks.mdx Italian translation by @alexcalabrese in #17459)
  • Fix gradient checkpointing bug in xglm by @KMFODA in #22127
  • Add TFVisionTextDualEncoder by @Rocketknight1 in #21873
  • Fix gradient checkpointing bug in Trajectory Transformer by @KMFODA in #22125
  • Fix gradient checkpointing bug in xlm_roberta_xl by @KMFODA in #22128
  • Added big_models.mdx italian translation #17600 by @nickprock in #22115
  • [Blip2] skip accelerate test by @younesbelkada in #22124
  • Fix gradient checkpointing bug in xmod by @KMFODA in #22129
  • Fix gradient checkpointing bug in LongT5 by @KMFODA in #22130
  • Fix gradient checkpointing bug in trocr by @KMFODA in #22126
  • Zero-shot image classification task guide by @MKhalusova in #22132
  • Fix doc link for MGP-STR by @sgugger in #22138
  • Adding Type Hints to TF_Pegasus model by @mollerup23 in #21941
  • Add a new script to check model testers' config by @ydshieh in #22063
  • Update configuration_align.py (projected_dim=640) by @bishmdl76 in #22139
  • Trainer: let generate pick its inputs by @gante in #22108
  • Enforce same behavior as PyTorch 2.0 for older versions by @sgugger in #22136
  • [trainer] fix bug in grad accum with multiple epochs by @stas00 in #22098
  • [deepspeed docs] Activation Checkpointing by @stas00 in #22099
  • Remove backend check for torch.compile by @sgugger in #22140
  • Prepare daily CI for torch 2.0.0 by @ydshieh in #22135
  • docs: New terms and updates to glossary by @MichaelRipa in #21982
  • Move is_pipeline_test_to_skip to specific model test classes by @ydshieh in #21999
  • Add ConvNeXT V2 by @alaradirik in #21679
  • Update 2 doctest expected values for torch 2.0.0 by @ydshieh in #22148
  • Translation Italian: perf_train_cpu and perf_train_cpu_many by @nickprock in #22151
  • Fix big model inference for T5 models in float16 by @sgugger in #22095
  • Create MaskedImageCompletionOutput and fix ViT docs by @alaradirik in #22152
  • to_pil - don't rescale if int and in range 0-255 by @amyeroberts in #22158
  • [trainer] add --optim adamw_torch_fused for pt-2.0+ by @stas00 in #22144
  • Revert "Enforce same behavior as PyTorch 2.0 for older versions" by @sgugger in #22163

Significant community contributions

The following contributors have made significant changes to the library over the last release:

Don't miss a new transformers release

NewReleases is sending notifications on new releases.