BridgeTower
The goal of this model is to build a bridge between each uni-modal encoder and the cross-modal encoder to enable comprehensive and detailed interaction at each layer of the cross-modal encoder thus achieving remarkable performance on various downstream tasks with almost negligible additional performance and computational costs.
- Add BridgeTower model by @abhiwand in #20775
- Add loss for BridgeTowerForMaskedLM and BridgeTowerForImageAndTextRetrieval by @abhiwand in #21684
- [WIP] Add BridgeTowerForContrastiveLearning by @abhiwand in #21964
Whisper speedup
The Whisper model was integrated a few releases ago. This release offers significant performance optimizations when generating with timestamps. This was made possible by rewriting the generate()
function of Whisper
, which now uses the generation_config
and implementing a batched timestamp prediction. The language
and task
can now also be setup when calling generate()
. For more details about this refactoring checkout this colab.
Notably, whisper is also now supported in Flax
🚀 thanks to @andyehrenberg ! More whisper related commits:
- [Whisper] Refactor whisper by @ArthurZucker in #21252
- [WHISPER] Small patch by @ArthurZucker in #21307
- [Whisper] another patch by @ArthurZucker in #21324
- add flax whisper implementation by @andyehrenberg in #20479
- Add WhisperTokenizerFast by @jonatanklosko in #21222
- Remove CLI spams with Whisper FeatureExtractor by @qmeeus in #21267
- Update document of WhisperDecoderLayer by @ling0322 in #21621
- [WhisperModel] fix bug in reshaping labels by @jonatasgrosman in #21653
- [Whisper] Add SpecAugment by @bofenghuang in #21298
- Fix-ci-whisper by @ArthurZucker in #21767
- Fix
WhisperModelTest
by @ydshieh in #21883 - [Whisper] Add rescaling function with
do_normalize
by @ArthurZucker in #21263 - Refactor whisper asr pipeline to include language too. by @Narsil in #21427
- Update
model_split_percents
forWhisperModelTest
by @ydshieh in #21922 - [Whisper] Fix feature normalization in
WhisperFeatureExtractor
by @bofenghuang in #21938 - [Whisper] Add model for audio classification by @sanchit-gandhi in #21754
- fixes the gradient checkpointing of whisper by @soma2000-lang in #22019
- Skip 3 tests for
WhisperEncoderModelTest
by @ydshieh in #22060 - [Whisper] Remove embed_tokens from encoder docstring by @sanchit-gandhi in #21996
- [
Whiper
] addget_input_embeddings
toWhisperForAudioClassification
by @younesbelkada in #22133 - [🛠️] Fix-whisper-breaking-changes by @ArthurZucker in #21965
DETA
DETA (short for Detection Transformers with Assignment) improves Deformable DETR by replacing the one-to-one bipartite Hungarian matching loss with one-to-many label assignments used in traditional detectors with non-maximum suppression (NMS). This leads to significant gains of up to 2.5 mAP.
- Add DETA by @NielsRogge in #20983
SpeechT5
The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder.
XLM-V
XLM-V is multilingual language model with a one million token vocabulary trained on 2.5TB of data from Common Crawl (same as XLM-R).
- Add XLM-V to Model Doc by @stefan-it in #21498
BLIP-2
BLIP-2 leverages frozen pre-trained image encoders and large language models (LLMs) by training a lightweight, 12-layer Transformer encoder in between them, achieving state-of-the-art performance on various vision-language tasks. Most notably, BLIP-2 improves upon Flamingo, an 80 billion parameter model, by 8.7% on zero-shot VQAv2 with 54x fewer trainable parameters.
- Add BLIP-2 by @NielsRogge in #21441
X-MOD
X-MOD extends multilingual masked language models like XLM-R to include language-specific modular components (language adapters) during pre-training. For fine-tuning, the language adapters in each transformer layer are frozen.
Ernie-M
ERNIE-M is a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance.
TVLT
The Textless Vision-Language Transformer (TVLT) is a model that uses raw visual and audio inputs for vision-and-language representation learning, without using text-specific modules such as tokenization or automatic speech recognition (ASR). It can perform various audiovisual and vision-language tasks like retrieval, question answering, etc.
- Add TVLT by @zinengtang in #20725
CLAP
CLAP (Contrastive Language-Audio Pretraining) is a neural network trained on a variety of (audio, text) pairs. It can be instructed in to predict the most relevant text snippet, given an audio, without directly optimizing for the task. The CLAP model uses a SWINTransformer to get audio features from a log-Mel spectrogram input, and a RoBERTa model to get text features. Both the text and audio features are then projected to a latent space with identical dimension. The dot product between the projected audio and text features is then used as a similar score.
- [CLAP] Add CLAP to the library by @ArthurZucker in #21370
- [
CLAP
] Fix few broken things by @younesbelkada in #21670
GPTSAN
GPTSAN is a Japanese language model using Switch Transformer. It has the same structure as the model introduced as Prefix LM in the T5 paper, and support both Text Generation and Masked Language Modeling tasks. These basic tasks similarly can fine-tune for translation or summarization.
- add GPTSAN model (reopen) by @tanreinama in #21291
EfficientNet
EfficientNets are a family of image classification models, which achieve state-of-the-art accuracy, yet being an order-of-magnitude smaller and faster than previous models.
- Add EfficientNet by @alaradirik in #21563
ALIGN
ALIGN is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image classification. ALIGN features a dual-encoder architecture with EfficientNet as its vision encoder and BERT as its text encoder, and learns to align visual and text representations with contrastive learning. Unlike previous work, ALIGN leverages a massive noisy dataset and shows that the scale of the corpus can be used to achieve SOTA representations with a simple recipe.
- Add ALIGN to transformers by @alaradirik in #21741
Informer
Informer is a method to be applied to long-sequence time-series forecasting. This method introduces a Probabilistic Attention mechanism to select the “active” queries rather than the “lazy” queries and provides a sparse Transformer thus mitigating the quadratic compute and memory requirements of vanilla attention.
API updates and improvements
Safetensors
safetensors
is a safe format of serialization of tensors, which has been supported in transformers
as a first-class citizen for the past few versions.
This change enables explicitly forcing the from_pretrained
method to use or not to use safetensors
. This unlocks a few use-cases, notably the possibility to enforce loading only from this format, limiting security risks.
Example of usage:
from transformers import AutoModel
# As of version v4.27.0, this loads the `pytorch_model.bin` by default if `safetensors` is not installed.
# It loads the `model.safetensors` file if `safetensors` is installed.
model = AutoModel.from_pretrained('bert-base-cased')
# This forces the load from the `model.safetensors` file.
model = AutoModel.from_pretrained('bert-base-cased', use_safetensors=True)
# This forces the load from the `pytorch_model.bin` file.
model = AutoModel.from_pretrained('bert-base-cased', use_safetensors=False)
- [Safetensors] Add explicit flag to from pretrained by @patrickvonplaten in #22083
Variant
This PR adds a "variant" keyword argument to PyTorch's from_pretrained and save_pretrained so that multiple weight variants can be saved in the model repo.
Example of usage with the model hosted in this folder on the Hub:
from transformers import CLIPTextModel
path = "huggingface/the-no-branch-repo" # or ./text_encoder if local
# Loads the `no_ema` variant. This loads the `pytorch_model.fp16.bin` file from this folder.
model = CLIPTextModel.from_pretrained(path, subfolder="text_encoder", variant="fp16")
# This loads the no-variant checkpoint, loading the `pytorch_model.bin` file from this folder.
model = CLIPTextModel.from_pretrained(path, subfolder="text_encoder")
- Add variant to transformers by @patrickvonplaten in #21332
- [Variant] Make sure variant files are not incorrectly deleted by @patrickvonplaten in #21562
bitsandbytes
The bitsandbytes
integration is overhauled, now offering a new configuration: the BytsandbytesConfig
.
Read more about it in the documentation.
- [
bnb
] IntroducingBitsAndBytesConfig
by @younesbelkada in #21579 - [
bnb
] fixbnb
decoders bug by @younesbelkada in #21688
FSDP
This PR enables the user to make use of the PyTorch/XLA implementation of FSDP, including the newly added auto-wrap feature. Four arguments have been added to training_args.py
to facilitate this functionality:
xla_fsdp
: this flag is a string containing the location of a.json
file which specifies the FSDP arguments the user wants to use when wrapping their model.xla_fsdp_min_num_params
: this flag is an int which will set a size-based automatic wrapping policy which automatically FSDP wraps any module with at leastxla_fsdp_min_num_params
many parameters.xla_fsdp_transformer_layer_cls_to_wrap
: this flag is a list of (case-sensitive) strings which will set a layer-class-based automatic wrapping policy which automatically FSDP wraps any module whose name matches one of the listed strings.xla_fsdp_grad_ckpt
: this flag is a bool which determines whether gradient checkpointing is enabled for the automatically wrapped layers.
- Enable PyTorch/XLA Fully Sharded Data Parallel (FSDP) by @AlexWertheim in #21406
Breaking changes
Generate
This PR standardizes beam search behavior across all three frameworks through early_stopping
. PyTorch is unchanged, but TensorFlow and Flax users will see a significant speedup if they keep the default generation parameters.
There are, however, minor differences in outputs of the .generate
method with beam search on TensorFlow and Flax. It should be very small and will come with significant speedups, but in case it breaks your workflow, we recommend you downgrade to a previous version and let us know in a GitHub issue so that we may investigate what is going on.
Single model initialization
Model initialization has problems which led to the initialization being incoherent across models and across initialization techniques. This is technically a bugfix, but as it may result in your models being initialized with different values, we think it best to highlight it here.
Deprecations
This PR deprecated the parallelize
API which has been replaced by accelerate
months ago. We recommend loading the model using the device_map
attribute and setting it to balanced
to obtain the previous behavior.
Setting your own device_map
is still permitted, but it needs to be a dictionary from module name to device, for example:
device_map = {'h.0': 0, 'h.1': 1, ...}
Pipelines
A new pipeline focused on zero-shot audio classification is added to the repository.
- [Pipeline] Add zero shot audio classification pipeline by @ArthurZucker in #21600
Documentation
The task and model summaries have been refactored to take into account the larger number of tasks and models we now have.
Bugfixes and improvements
- [
t5
] Fix T5 inference infloat16
+bnb
error by @younesbelkada in #21281 - [examples/deepspeed] fix renamed api by @stas00 in #21283
- [GenerationConfig] add additional kwargs handling by @ArthurZucker in #21269
- [W2V2 with LM] Fix decoder test with params by @sanchit-gandhi in #21277
- Fix
TrainingArguments.label_names
docs to reflect the correct default value behaviour by @fredtcaroli in #21288 - Update expected values for doctest by @stevhliu in #21284
- [GIT] Add test for batched generation by @NielsRogge in #21282
- Supporting
ImageProcessor
in place ofFeatureExtractor
for pipelines by @Narsil in #20851 - [Mask2Former] Add doc tests by @NielsRogge in #21232
- Moving to cleaner tokenizer version or
oneformer
. by @Narsil in #21292 - Fix
EfficientFormer
by @ydshieh in #21294 - [Hubert] Fix Hubert processing auto by @younesbelkada in #21299
- Update
OneFormerModelIntegrationTest
expected values by @ydshieh in #21295 - [Doctest] Fix
Blenderbot
doctest by @younesbelkada in #21297 - Documentation code sample fixes by @MKhalusova in #21302
- [CI-Daily] replace
past
in prepare inputs for generation by @ArthurZucker in #21296 - Small fix to ExponentialDecayLengthPenalty docstring by @njhill in #21308
- Accept batched tensor of images as input to image processor by @amyeroberts in #21144
- Use
model_class.__name__
and compare againstXXX_MAPPING_NAMES
by @ydshieh in #21304 - Fix 2 paths in the doctest list by @ydshieh in #21314
- [i18n-KO] Translated quicktour page to Korean by @wonhyeongseo in #20946
- Small QoL for qa. by @Narsil in #21316
- check paths in
utils/documentation_tests.txt
by @ydshieh in #21315 - Fix
TFEncoderDecoder
tests by @ydshieh in #21301 - Generate: better
compute_transition_scores
examples by @gante in #21323 - [Doctest] Fix
Perceiver
doctest by @younesbelkada in #21318 - Update Hebrew language code to he per IANA registry by @altryne in #21310
- Fix M2M100 positional embedding creation for ONNX by @michaelbenayoun in #21328
- Fix
RobertaPreLayerNorm
doctest by @ydshieh in #21337 - Little cleanup: let huggingface_hub manage token retrieval by @Wauplin in #21333
- Automated compatible models list for task guides by @MKhalusova in #21338
- Fix
GitModelIntegrationTest.test_batched_generation
device issue by @ydshieh in #21362 - Pipeline testing - using tiny models on Hub by @ydshieh in #20426
- fix the issue that the output dict of jit model could not get [0] by @sywangyi in #21354
- Corrected by @HsiangNianian in #21350
- Remove duplicate declarations in dummy inputs for TFLongformer by @peakji in #21352
- Fix DETR tests after #21144 by @amyeroberts in #21365
- Add cPython files in build by @sgugger in #21372
- Generate: Relaxed
max_length
andmax_new_tokens
coexistence by @gante in #21347 - Fixes path for Graphormer checkpoint by @clefourrier in #21367
- Adding resource section to GPT-J docs by @adit299 in #21270
- translate index to zh by @bfss in #20095)
- [
run_(clm|mlm).py
examples] add streaming dataset support by @stas00 in #21343 - Template for framework-agnostic tests by @gante in #21348
- Cleanup the usage of
layer_norm_eps
in some models by @ydshieh in #21336 - Do not log the generation config for each prediction step in TrainerSeq2Seq by @regisss in #21385
- [Docs] Minor fixes by @NielsRogge in #21383
- Simplify column_names in run_clm/mlm by @lhoestq in #21382
- Add support of backward_prefetch and forward_prefetch by @raghavanone in #21237
- Remove more unused attributes in config classes by @ydshieh in #21327
- Generate: fix TF XLA tests on models with
max_position_embeddings
ormax_target_positions
by @gante in #21389 - Update
Graphormer
and fix itstorchscript
test failures by @ydshieh in #21380 - Moved LiLT under multimodal models in TOC by @MKhalusova in #21393
- Fix the issue of using only inputs_embeds in convbert model by @raghavanone in #21398
- Skip batches fast with accelerate by @sgugger in #21390
- Added DagshubCallback by @jinensetpal in #21404
- Add TF image classification example script by @amyeroberts in #19956
- Generate: decoder-only models can generate with
inputs_embeds
by @gante in #21405 - Use torch
1.13.1
in push/schedule CI by @ydshieh in #21421 - Fix image_processor_class bug by @shikhartuli in #21410
- Add distinct section names for PyTorch and TF by @Rocketknight1 in #21422
- Add the GeLU activation from pytorch with the tanh approximation by @jlamypoirier in #21345
- Fix Graphormer test suite by @clefourrier in #21419
- [
bnb
] Fine-tuning HF 8-bit models by @younesbelkada in #21290 - Allow to add more information in
is_flaky
by @ydshieh in #21426 - Fix some pipeline tests by @ydshieh in #21401
- Fix task guide formatting by @stevhliu in #21409
- Fixes bug in the creation of ExponentialDecayLengthPenalty by @jorgemcgomes in #21423
- Add
inputs_embeds
support for.generate()
with BLOOM models by @akreal in #21430 - Remove more unused attributes in config classes by @ydshieh in #21392
- Added model resources for LayoutLM Issue#19848 by @avisinghal6 in #21377
- Fix device issue in a
ConvBertModelTest
test by @ydshieh in #21438 - do not scale gradient in bf16 mode by @kashif in #21428
- exclude deleted files in the fixup script by @dtuit in #21436
- Add tutorial doc for TF + TPU by @Rocketknight1 in #21429
- For IterableDataset, return DataLoader using self._train_batch_size. … by @agossard in #21447
- Avoid flaky generation sampling tests by @ydshieh in #21445
- Fix
SpeechT5ForSpeechToSpeechIntegrationTests
device issue by @ydshieh in #21460 - Add perf numbers for perf_train_cpu by @jianan-gu in #20974
- Added documentation for DagsHubCallback by @jinensetpal in #21452
- Fix
PushToHubCallback
import in Share a model docs by @ireneisdoomed in #21457 - Add VQGAN-CLIP research project by @ErwannMillon in #21329
- Fixed RAG script which was failing on dummy example by @kaustubhdhole in #21416
- make SpeechT5 doc examples deterministic by @hollance in #21470
- Generate: TF can now accept custom logits processors by @gante in #21454
- Removing
more_itertools
dependency. by @Narsil in #21473 - [examples] improve block_size warning message by @stas00 in #21463
- [i18n-fr] Translate index page to French by @NoB0 in #21458
- OPT: BLIP2-ready
prepare_inputs_for_generation
by @gante in #21477 - Add tips for generation with Int8 models by @lewtun in #21424
- Update quality tooling for formatting by @sgugger in #21480
- Fix epoch number when resuming training by @sgugger in #21478
- [CI ] Remove
past
in favor ofpat_key_values
by @ArthurZucker in #21443 - Generate: TF can now generate from embeddings in encoder-decoder models by @gante in #21475
- [
Doc
] Fix int8 docs by @younesbelkada in #21487 - changed "ot" to "to" by @Iulian277 in #21488
- 🖊️ fix typo in pytorch semantic segmentation readme by @jvdd in #21492
- Typos/fixes to link syntax by @Rocketknight1 in #21450
- Sanity check the type of id2label and label2id arguments of from_pretrained for TokenClassification models by @raghavanone in #21490
- [OPT] Adds
GPT2TokenizerFast
to the list of tokenizer to use for OPT. by @ArthurZucker in #20823 - A new test to check config attributes being used by @ydshieh in #21453
- Add limit_all_gathers option to fsdp_config and fix forward_prefetch bug by @raghavanone in #21489
- Cleanup quality by @sgugger in #21493
- [tokenizer] sanitize saved config by @stas00 in #21483
- Add inverse sqrt learning rate scheduler by @Sager611 in #21495
- Check for mapping/dict in distributed_concat function by @prajwal967 in #21500
- Fix import in Accelerate for find_exec_bs by @sgugger in #21501
- Wrap RemBert integration test forward passes with torch.no_grad() by @katiele47 in #21503
- Exclude the madeup words from M2M100Tokenizer.vocab_size by @guillaumekln in #20976
- [Doc] Minor URL fixes in PyTorch Text Classification Readme by @stefan-it in #21511
- Generate: TF
compute_transition_scores
by @gante in #21341 - no more dummies for speech processors by @hollance in #21517
- Update OPT conversion script to work for OPT-IML by @thomasw21 in #21519
- [tests] add missing
report_to none
by @stas00 in #21505 - Fixing backward compatiblity
image_processor
in pipeline. by @Narsil in #21513 - Fix multiple
eos_token_id
s in model.generate(...) by @tokestermw in #21461 - Add
__len__
method to_LazyAutoMapping
by @ydshieh in #21522 - Generate: make TF
.generate()
signature == PT.generate()
signature by @gante in #21525 - Generate: TF
.generate()
can now be exported with dynamic length by @gante in #21474 - Fix missing unfinished_sequences by @tokestermw in #21529
- Fix ClearML Integration to run in ClearML pipelines and external Tasks. by @thepycoder in #21531
- Tag tests as slow ⌛ by @gante in #21537
- fix typo in run_speech_recognition_ctc.py by @21jun in #21528
- Fix inclusion of non py files in package by @sgugger in #21546
- Fix from_pretrained API with config and state_dict by @sgugger in #21542
- Added with torch.no_grad() to XLM-Roberta integration test by @katiele47 in #21547
- [
pipeline
] A simple fix for half-precision & 8bit models by @younesbelkada in #21479 - Added with torch.no_grad() to Camembert integration test by @katiele47 in #21544
- adding a tip for deepspeed integration in multi-node environment by @izapolsk in #21459
- Fix stuff related to the causal_mask in CodeGen. by @GeneZC in #21527
- Replace inefficient torch.sqrt taking scalar input with numpy.sqrt by @FindHao in #21496
- Add _mp_fn to run_mae.py for XLA testing by @steventk-g in #21551
- [Tests] Improve flax test_attention_outputs by @Shubhamai in #21486
- [from_pretrained] extend
torch_dtype="auto"
to look upconfig.torch_dtype
first, expand docs by @stas00 in #21524 - [Tasks] Adds image captioning by @sayakpaul in #21512
- Goodbye to Blip-2 doctests by @ydshieh in #21566
- [deepspeed] deal with models w/o
config.hidden_size
by @stas00 in #21504 - improving contributing tests section by @Shubhamai in #21569
- Replace input_values_processing with unpack_inputs by @amyeroberts in #21502
- Added timesformer configuration by @AdiaWu in #21446
- Remove more unused attributes in config classes by @ydshieh in #21543
- [
Blip2
] Add int8 support forblip2-flan-t5-xxl
by @younesbelkada in #21574 - Generate: TF supports multiple eos tokens by @gante in #21571
- Add: document question answering task guide by @MKhalusova in #21518
- CI: skip failing TF hubert test by @gante in #21601
- Remove trailing 'extractive' word from en documentation by @tpaviot in #21594
- [MINOR] Fix link in timeseries transformer docs by @cakiki in #21602
- Add
inputs_embeds
support when generating with GPT-J by @dimitry12 in #21575 - Generate: Fix flaky indexing error in
test_constrained_beam_search_generate_dict_output
by @gante in #21561 - [
bnb
] Let's make the daily CI green 🍏 by @younesbelkada in #21597 - annotated TFvisionEncoderDecoder input type hints by @miyu386 in #21432
- Correct Markdown bullets indentation by @wangkuiyi in #21583
- Add missing arguemtn to run_clip.py by @WarrenGreen in #21588
- Fix Blip-2 CI by @ydshieh in #21595
- Generate: correct default model input creation for decoder-only models by @gante in #21580
- [i18n-fr] Translate quicktour page to French by @NoB0 in #21589
- Update setup.py by @stas00 in #21584
- [deepspeed] performance docs by @stas00 in #21573
- Clarify available pipelines in quicktour by @stevhliu in #21607
- Fix env. variable type issue in testing by @ydshieh in #21609
- Fix TF CTC tests by @gante in #21606
- Add in big model inference to issue template by @muellerzr in #21611
- Enable
requires_grad
on input embedding to train on top of frozen layers by @younesbelkada in #21598 - Generate: filter encoder inputs when its signature does not accept wildcards by @gante in #21603
- Generate: input expansion for any model input by @gante in #21624
- Final cleanup of TOKENIZER_FOR_DOC by @sgugger in #21565
- Remove Niels from templates by @sgugger in #21564
- Fix generation config for empty state dict by @sgugger in #21630
- Removes duplicate computations in DETR post processing by @eclique in #21592
- Fix typo in documentation. by @mmcdermott in #21632
- Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) by @BenoitDalFerro in #21627
- Fix typo in QA task guide by @stevhliu in #21608
- fix: Race Condition when using Sagemaker Checkpointing and Model Repository by @DougTrajano in #21614
- Remove extra "
max_length
is reached." from InfNaNLogitsProcessor documentation by @mmcdermott in #21634 - Fix Blip-2 CI again by @ydshieh in #21637
- Skip wav2vec2 hubert high mem tests by @amyeroberts in #21643
- Fix passing kwargs to TFBertTokenizer by @balvisio in #21619
- Skipping more high mem tests - Wav2Vec2 Hubert by @amyeroberts in #21647
- Pass parent exception as context exception to provide clearer stack trace by @balvisio in #21636
- Generate: PT Dynamo without graph breaks in the main greedy/sample loop by @gante in #21648
- Update deprecated load_module by @sgugger in #21651
- Fix typos in contrastive-image-text example README by @regisss in #21665
- [WIP] Move X-MOD models to facebook organization by @jvamvas in #21640
- refactor: Make direct_transformers_import util by @connor-henderson in #21652
- [bloom] gradient_checkpointing fix by @stas00 in #21655
- Add OPT resources to the transformers documentation by @alissadb in #21625
- Adapt PerceiverIO Multimodal class to work with arbitrary modalities by @stevenmanton in #20054
- Fix multi-gpu training error for LayoutLMv2 by @akkikiki in #21675
- Generate: eta sampling numerical stability by @gante in #21676
- [
ImageProcessor
] Refactor defaultmean
&std
toOPENAI_CLIP_MEAN
&OPENAI_CLIP_STD
by @younesbelkada in #21425 - [
BLIP
] update blip path on slow tests by @younesbelkada in #21476 - Fix dynamic module import error by @ydshieh in #21646
- Fix for non-contiguous label tensors in VisonEncoderDecoder by @morganmcg1 in #21582
- Fix-rag-finetune-project-requirement by @ArthurZucker in #21697
- Pass along revision in dynamic code fetch by @sgugger in #21698
- Fix axial positional encoding calculations for reformer.mdx by @ijindal in #21649
- remove position ids and token type ids from forward args in docstring by @ArthurZucker in #21701
- Fix typo in
PROCESSOR_MAPPING_NAMES
and add tests by @ydshieh in #21703 - Fix
get_class_in_module
by @ydshieh in #21709 - Fix TVLT (torch device issue) by @ydshieh in #21710
- Adding task guides to resources by @MKhalusova in #21704
- Adding type hints to call() functions in this file by @mollerup23 in #21548
- Time series transformer: input projection and Std scaler by @kashif in #21020
- Apply ruff flake8-comprehensions by @Skylion007 in #21694
- [
MBart
] Fix cross attention mask check by @younesbelkada in #21730 - Respect documentation on passive log level by @sgugger in #21700
- Remove
gptsan_japanese
from doctest list to avoid GPU OOM by @ydshieh in #21722 - Change doc example for
BigBirdForQuestionAnswering
by @ydshieh in #21723 - Fix
ErnieMEmbeddings
device issue by @ydshieh in #21726 - Fix
GPTSanJapaneseModel
by @ydshieh in #21731 - [SpeechT5HifiGan] Handle batched inputs by @sanchit-gandhi in #21702
- Fix to KerasMetricCallback when the model returns unstructured output by @Rocketknight1 in #21727
- Added "Open in Colab" to task guides by @MKhalusova in #21729
- typos in french documentation by @tpaviot in #21750
- Make ImageProcessorMixin compatible with subfolder kwarg by @Abhinay1997 in #21725
- Update doctest GH workflow file by @ydshieh in #21744
- Fix 2 quicktour file doctest by @ydshieh in #21742
- [
GPTNeo
] Fix gradient checkpointing bug by @younesbelkada in #21733 - Generate: Fix GIT batched captioning by @gante in #21738
- Added Type Hints for modeling_tf_encoder_decoder.py by @Batese2001 in #21673
- Auto api Value Error addition to Troubleshoot by @MKhalusova in #21708
- [deepspeed tests] fix issues introduced by #21700 by @stas00 in #21769
- Graphormer fix by @clefourrier in #21699
- fix: Change is_last chunk calc and add conditional break in chunk_iter by @connor-henderson in #21612
- [Flax] adding support for batch norm layers by @Shubhamai in #21581
- [Examples] Generalise run audio classification for log-mel models by @sanchit-gandhi in #21756
- Different behavior in DistilBERT when using "inputs_embeds" by @ArthurZucker in #21752
- [Flax] Fix erroneous kwargs being passed to generate config by @sanchit-gandhi in #21765
- Generate - update cookie cutters to not initialize cache with training and gradient checkpointing by @gante in #21759
- [time series] updated expected values for integration test. by @kashif in #21762
- [GPT2, ProphetNet] Fix gradient checkpointing bug by @yhl48 in #21772
- [SpeechT5] Fix HiFiGAN tests by @sanchit-gandhi in #21788
- Fix resume_from_checkpoint for deepspeed by @mosheber in #21735
- [examples/summarization] deal with
max_length
andnum_beams
by @bofenghuang in #21740 - Fix type in gpt2 config docstring by @WeberJulian in #21782
- Fix en documentation typos by @tpaviot in #21799
- [FX tracer] Make
concrete_args
from outside available by @lygztq in #21775 - [torch] remove deprecated uint8 in favor of bool by @ArthurZucker in #21384
- [
tests
] addaccelerate
marker by @younesbelkada in #21743 - Fix PyTorch Perceiver
PerceiverFourierPositionEncoding
with fp16 by @fxmarty in #21787 - Fix nn.init.trunc_normal_ call on torch.float16 data by @fxmarty in #21789
- Fix gradient checkpointing bug in gptneox by @KMFODA in #21815
- Inheritance-based framework detection by @gante in #21784
- Fix quality with
ruff==0.0.253
by @ydshieh in #21828 - introduce
logger.warning_once
and use it for grad checkpointing code by @stas00 in #21804 - Rename
MobileViTModelTest
toTFMobileViTModelTest
by @ydshieh in #21825 - Fix gradient checkpointing bug BioGpt by @saswatmeher in #21844
- check for None forced tokens by @andyehrenberg in #21793
- Fix gradient checkpointing bug in git by @KMFODA in #21818
- Fix gradient checkpointing imagegpt by @KMFODA in #21816
- Fix tf random token masking probability in data collator by @anruijian in #21834
- [
T5
] Fix torchquant issue by @younesbelkada in #21843 - [
Blip2
] AddBlip2Model
by @younesbelkada in #21817 - Fix the issue of blip model returning loss even when the label is not provided. by @raghavanone in #21811
- [GPTJ] Fix gradient checkpointing bug by @krypticmouse in #21794
- Add: task guide for zero shot object detection by @MKhalusova in #21829
- Make Slack CI reporting stronger by @ydshieh in #21823
- [
Blip2
] Fix Blip-2 multi gpu by @younesbelkada in #21707 - 🔥Rework pipeline testing by removing
PipelineTestCaseMeta
🚀 by @ydshieh in #21516 - Improve TF weight loading, especially PT crossloading by @Rocketknight1 in #21792
- Fix flaky test for log level by @sgugger in #21776
- prepare for "floordiv is deprecated and its behavior will change in a future version of pytorch" by @ArthurZucker in #20211
- [ConvBert] Fix #21523 by @ArthurZucker in #21849
- Flax beam search fix by @andyehrenberg in #21857
- Fix gradient checkpointing bug Bart by @saswatmeher in #21866
- [deepspeed] check whether model is NLP one instead of counting on input type by @izapolsk in #21800
- Change the way tensor is reshaped in BartAttention (from .view to .reshape) by @raghavanone in #21860
- Italian translation of community.mdx by @lorenzobalzani in #21871
- [
Blip
] Fix blip doctest by @younesbelkada in #21868 - Removed BLIP mention from the troubleshooting guide by @MKhalusova in #21872
- update FSDP and add XLA-FSDP documentation by @pacman100 in #21812
- [doc] deepspeed tests by @stas00 in #21859
- Add an utility file to get information from test files by @ydshieh in #21856
- Add check for different embedding types in examples by @Rocketknight1 in #21881
- Make loading of pretrained gpt2 faster by avoiding initialization of Conv1D's weights by @twaka in #21879
- Fix Gradient checkpointing bug BigBird by @saswatmeher in #21882
- Fix
test_load_default_pipelines_pt
forClapModel
by @ydshieh in #21886 - fix checkpoint by @ArthurZucker in #21874
- [Refactor] Relative imports wherever we can by @ArthurZucker in #21880
- [ZAC] fix ci daily by @ArthurZucker in #21893
- Use PyAV instead of Decord in examples by @amyeroberts in #21572
- Add
inputs_embeds
functionality when generating with BioGPT by @sidkiblawi in #21889 - [T5 doc] Fix confusing documentation about
d_kv
by @ArthurZucker in #21896 - fix typo in Bart's attention by @kashif in #21898
- [GPT-J] add deprecation warning by @ArthurZucker in #21869
- fsdp bf16 enable autocast by @pacman100 in #21847
- Fix gradient checkpointing bug LED by @KMFODA in #21840
- Fix gradient checkpointing bug M2M 100 by @KMFODA in #21841
- Fix gradient checkpointing bug marian by @KMFODA in #21842
- Mark pipeline tests to skip them easily by @sgugger in #21887
- Clean up auto mapping names by @ydshieh in #21903
- Prophetnet batch dimension inversion fix by @kiansierra in #21870
- Make schedulers picklable by making lr_lambda fns global by @connor-henderson in #21768
- Add Blip and Blip2 for pipeline tests by @ydshieh in #21904
- Temporarily skip 3 tests in
BridgeTowerModelTest
by @ydshieh in #21908 - Faster zero shot image by @Narsil in #21897
- [time series] Add Time series inputs tests by @kashif in #21846
- Avoid modeling tests run in pipeline CI jobs by @ydshieh in #21911
- Fix doctests for TFVisionTextDualEncoder by @Rocketknight1 in #21910
- faster forward following what is done for images by @ArthurZucker in #21906
- Fix gradient checkpointing bug in MBart by @KMFODA in #21918
- Fix gradient checkpointing bug in mvp by @KMFODA in #21920
- Fix gradient checkpointing megatron bert by @KMFODA in #21921
- Use large VM for
repo_utils_job
by @ydshieh in #21928 - Cleanup more auto mapping names by @ydshieh in #21909
- feat: filter try/except when looking at custom code by @zanussbaum in #21914
- Fix
AlignModelTest
tests by @ydshieh in #21923 - Avoid failure in
check_repo.py
due to missing backends by @ydshieh in #21930 - Fix wrong documentation about DataCollator padding defaults by @substanc3-dev in #21919
- [Flan-UL2] Add-flan-ul2 by @ArthurZucker in #21929
- Update README logo by @gary149 in #21933
- [CLAP] Support batched inputs for CLAP. Fixes pipeline issues by @ArthurZucker in #21931
- Fix gradient checkpointing bug in OPT by @KMFODA in #21943
- Fix gradient checkpointing bug in Pegasus by @KMFODA in #21944
- Fix gradient checkpointing bug in Rembert by @KMFODA in #21945
- Fix gradient checkpointing bug in Roformer by @KMFODA in #21946
- Fixed gradient_checkpointing/use_cache bug in blenderbot by @Batese2001 in #21833
- Update expected values in
XLMProphetNetModelIntegrationTest
by @ydshieh in #21957 - [CI] Fix ci by @ArthurZucker in #21940
- Disable DDP for neuron by @aws-sangeetha in #21953
- Fix bert issue by @saswatmeher in #21963
- [Generate] Fix gradient_checkpointing and use_cache bug for BLOOM by @asrimanth in #21956
- Add missing parameter definition in layoutlm config by @Atomnp in #21960
- Use larger atol in
torch.allclose
for some tests by @ydshieh in #21966 - Add TF contrastive image text finetuning example by @Rocketknight1 in #21939
- Update expected values for
test_xglm_sample
by @ydshieh in #21975 - Fix gradient checkpointing bug in BigBird Pegasus by @KMFODA in #21976
- Fix gradient checkpointing bug in Blenderbot Small by @KMFODA in #21977
- Fix gradient checkpointing bug in BlipText by @KMFODA in #21978
- Fix gradient checkpointing bug in Codegen by @KMFODA in #21979
- Fix gradient checkpointing bug in ESM by @KMFODA in #21980
- docs: improve clarity for language modeling by @pdhall99 in #21952
- Update
Jukebox
tests by @ydshieh in #21984 - Add check before int casting for PIL conversion by @amyeroberts in #21969
- Fix MinNewTokensLengthLogitsProcessor when used with a list of eos tokens by @eladsegal in #21959
- [DETR, YOLOS] Fix device bug by @NielsRogge in #21974
- Remove unneeded casts to bool by @regisss in #21983
- Update
notification_service.py
by @ydshieh in #21992 - Skip
test_multi_gpu_data_parallel_forward
for some model tests by @ydshieh in #21991 - Stop requiring Torch for our TF examples! by @Rocketknight1 in #21997
- [TF] Fix creating a PR while pushing in TF framework by @ArthurZucker in #21968
- [DETR and friends] Remove is_timm_available by @NielsRogge in #21814
- Update tiny model creation script and some others files by @ydshieh in #22006
- Generate - add 1 to cur_len to make up the new beam length by @jimmieliu in #21993
- VideoMAE doctest - use valid dummy pixel values by @amyeroberts in #22022
- update: bertology paper by @QiushiSun in #22012
- Update
AudioClassificationPipelineTests::test_small_model_pt
for PT 2.0.0 by @ydshieh in #22023 - [
bnb
] Fix bnb error message by @younesbelkada in #22026 - Fix test for torchneuroncore in Trainer by @sgugger in #22028
- Add tokenize_kwargs parameter definition in the FeatureExtractionPipeline by @anruijian in #22031
- [examples/speech-recognition] Add SpecAugment to run_speech_recognition_seq2seq.py by @bofenghuang in #21942
- Avoid
text_config_dict
andvision_config_dict
being saved for CLIP-like models by @ydshieh in #22035 - Mark all
BridgeTower
tests slow for now by @ydshieh in #22039 - Bug fix: token classification pipeline while passing offset_mapping by @cceyda in #22034
- Update ALIGN docs by @alaradirik in #22025
- [21737][T5]: Fix gradient checkpoint bug by @nipunjindal in #22036
- Docs Improvement - In ZSH, not using ' ' around pip install fails, fix it by @shaun-scale in #22045
- Can't install tf2 on M1 Chip by default by @shaun-scale in #22046
- Remove set_access_token usage + fail tests if FutureWarning by @Wauplin in #22051
- Show the number of
huggingface_hub
warnings in CI report by @ydshieh in #22054 - Return analysis for hyperparameter_search with Ray backend by @anruijian in #22040
- pt-to-tf model architecture override by @Rocketknight1 in #22055
- rm $ symbol from code block from contributing.md by @kamalkraj in #22057
- [deepspeed] offload + non-cpuadam optimizer exception by @stas00 in #22043
- Edit the docstring of
image_processing_donut
to match code by @vermouthmjl in #22033 - Add setters by type of args to TrainingArguments by @sgugger in #21570
- Update tiny model creation script by @ydshieh in #22058
- Fix case when using --gradient_accumulation_steps with DDP disabled. by @aws-sangeetha in #22007
- Add a progress bar for the total download of shards by @sgugger in #22062
- Fix gradient checkpointing bug in Speech2Text by @KMFODA in #22079
- Fix gradient checkpointing bug in switch transformer by @KMFODA in #22081
- [GPT2] Propose fix for #21080 by @ArthurZucker in #21853
- Fix small typo in flan-ul2.mdx by @kevin51jiang in #22068
- Generate - Fix broken documentation links by @gante in #22078
- Fix gradient checkpointing bug in Speecht5 by @KMFODA in #22080
- Fix hint in src/transformers/modeling_utils.py by @J-shang in #22074
- handle numpy inputs in whole word mask data collator by @dwyatte in #22032
- GPT-J specific half precision on CPU note by @MKhalusova in #22086
- Fix imports of TF MobileViT by @sgugger in #22065
- Revert "[GPT2] Propose fix for #21080" by @ydshieh in #22093
- Add AutoModelForZeroShotImageClassification by @alaradirik in #22087
- add new model of MGP-STR by @wdp-007 in #21418
- Add pr_checks.mdx Italian translation by @alexcalabrese in #17459)
- Fix gradient checkpointing bug in xglm by @KMFODA in #22127
- Add TFVisionTextDualEncoder by @Rocketknight1 in #21873
- Fix gradient checkpointing bug in Trajectory Transformer by @KMFODA in #22125
- Fix gradient checkpointing bug in xlm_roberta_xl by @KMFODA in #22128
- Added big_models.mdx italian translation #17600 by @nickprock in #22115
- [
Blip2
] skip accelerate test by @younesbelkada in #22124 - Fix gradient checkpointing bug in xmod by @KMFODA in #22129
- Fix gradient checkpointing bug in LongT5 by @KMFODA in #22130
- Fix gradient checkpointing bug in trocr by @KMFODA in #22126
- Zero-shot image classification task guide by @MKhalusova in #22132
- Fix doc link for MGP-STR by @sgugger in #22138
- Adding Type Hints to TF_Pegasus model by @mollerup23 in #21941
- Add a new script to check model testers' config by @ydshieh in #22063
- Update configuration_align.py (projected_dim=640) by @bishmdl76 in #22139
- Trainer: let generate pick its inputs by @gante in #22108
- Enforce same behavior as PyTorch 2.0 for older versions by @sgugger in #22136
- [trainer] fix bug in grad accum with multiple epochs by @stas00 in #22098
- [deepspeed docs] Activation Checkpointing by @stas00 in #22099
- Remove backend check for torch.compile by @sgugger in #22140
- Prepare daily CI for torch 2.0.0 by @ydshieh in #22135
- docs: New terms and updates to glossary by @MichaelRipa in #21982
- Move
is_pipeline_test_to_skip
to specific model test classes by @ydshieh in #21999 - Add ConvNeXT V2 by @alaradirik in #21679
- Update 2 doctest expected values for torch 2.0.0 by @ydshieh in #22148
- Translation Italian: perf_train_cpu and perf_train_cpu_many by @nickprock in #22151
- Fix big model inference for T5 models in float16 by @sgugger in #22095
- Create MaskedImageCompletionOutput and fix ViT docs by @alaradirik in #22152
- to_pil - don't rescale if int and in range 0-255 by @amyeroberts in #22158
- [trainer] add
--optim adamw_torch_fused
for pt-2.0+ by @stas00 in #22144 - Revert "Enforce same behavior as PyTorch 2.0 for older versions" by @sgugger in #22163
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @abhiwand
- @wonhyeongseo
- [i18n-KO] Translated quicktour page to Korean (#20946)
- @ErwannMillon
- Add VQGAN-CLIP research project (#21329)
- @NoB0
- @jvamvas
- @susnato
- Add Ernie-M Model to huggingface (#21349)
- @zinengtang
- Add TVLT (#20725)
- @andyehrenberg
- @tanreinama
- add GPTSAN model (reopen) (#21291)
- @jonatanklosko
- Add WhisperTokenizerFast (#21222)
- @Skylion007
- Apply ruff flake8-comprehensions (#21694)
- @kiansierra
- Prophetnet batch dimension inversion fix (#21870)
- @elisim
- [Time-Series] informer model (#21099)
- @wdp-007
- add new model of MGP-STR (#21418)