GenerationConfig
The generate
method has multiple arguments whose defaults were lying in the model config. We have now decoupled these in a separate generation config, which makes it easier to store different sets of parameters for a given model, with different generation strategies. While we will keep supporting generate arguments in the model configuration for the foreseeable future, it is now recommended to use a generation config. You can learn more about its uses here and its documentation here.
- Generate: use
GenerationConfig
as the basis for.generate()
parametrization by @gante in #20388 - Generate: TF uses
GenerationConfig
as the basis for.generate()
parametrization by @gante in #20994 - Generate: FLAX uses
GenerationConfig
as the basis for.generate()
parametrization by @gante in #21007
ImageProcessor
In the vision integration, all feature extractor classes have been deprecated to be renamed to ImageProcessor
. The old feature extractors will be fully removed in version 5 of Transformers and new vision models will only implement the ImageProcessor
class, so be sure to switch your code to this new name sooner rather than later!
- Add deprecation warning when image FE instantiated by @amyeroberts in #20427
- Vision processors - replace FE with IPs by @amyeroberts in #20590
- Replace FE references by @amyeroberts in #20702
New models
AltCLIP
AltCLIP is a variant of CLIP obtained by switching the text encoder with a pretrained multilingual text encoder (XLM-Roberta). It has very close performances with CLIP on almost all tasks, and extends the original CLIP’s capabilities to multilingual understanding.
BLIP
BLIP is a model that is able to perform various multi-modal tasks including visual question answering, image-text retrieval (image-text matching) and image captioning.
- Add BLIP by @younesbelkada in #20716
BioGPT
BioGPT is a domain-specific generative pre-trained Transformer language model for biomedical text generation and mining. BioGPT follows the Transformer language model backbone, and is pre-trained on 15M PubMed abstracts from scratch.
- Add BioGPT by @kamalkraj in #20420
BiT
BiT is a simple recipe for scaling up pre-training of ResNet-like architectures (specifically, ResNetv2). The method results in significant improvements for transfer learning.
- Add BiT + ViT hybrid by @NielsRogge in #20550
EfficientFormer
EfficientFormer proposes a dimension-consistent pure transformer that can be run on mobile devices for dense prediction tasks like image classification, object detection and semantic segmentation.
- Efficientformer by @Bearnardd in #20459
GIT
GIT is a decoder-only Transformer that leverages CLIP’s vision encoder to condition the model on vision inputs besides text. The model obtains state-of-the-art results on image captioning and visual question answering benchmarks.
- Add GIT (GenerativeImage2Text) by @NielsRogge in #20295
GPT-sw3
GPT-Sw3 is a collection of large decoder-only pretrained transformer language models that were developed by AI Sweden in collaboration with RISE and the WASP WARA for Media and Language. GPT-Sw3 has been trained on a dataset containing 320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code. The model was pretrained using a causal language modeling (CLM) objective utilizing the NeMo Megatron GPT implementation.
Graphormer
Graphormer is a Graph Transformer model, modified to allow computations on graphs instead of text sequences by generating embeddings and features of interest during preprocessign and collation, then using a modified attention.
- Graphormer model for Graph Classification by @clefourrier in #20968
Mask2Former
Mask2Former is a unified framework for panoptic, instance and semantic segmentation and features significant performance and efficiency improvements over MaskFormer.
- Add Mask2Former by @alaradirik and @shivalikasingh95 in #20792
OneFormer
OneFormer is a universal image segmentation framework that can be trained on a single panoptic dataset to perform semantic, instance, and panoptic segmentation tasks. OneFormer uses a task token to condition the model on the task in focus, making the architecture task-guided for training, and task-dynamic for inference.
- Add OneFormer Model by @praeclarumjj3 in #20577
Roberta prelayernorm
The RoBERTa-PreLayerNorm model is identical to RoBERTa but uses the --encoder-normalize-before flag in fairseq.
- Implement Roberta PreLayerNorm by @AndreasMadsen in #20305
Swin2SR
Swin2R improves the SwinIR model by incorporating Swin Transformer v2 layers which mitigates issues such as training instability, resolution gaps between pre-training and fine-tuning, and hunger on data.
- Add Swin2SR by @NielsRogge in #19784
TimeSformer
TimeSformer is the first video transformer. It inspired many transformer based video understanding and classification papers.
UPerNet
UPerNet is a general framework to effectively segment a wide range of concepts from images, leveraging any vision backbone like ConvNeXt or Swin.
- Add UperNet by @NielsRogge in #20648
Vit Hybrid
ViT hybrid is a slight variant of the plain Vision Transformer, by leveraging a convolutional backbone (specifically, BiT) whose features are used as initial “tokens” for the Transformer. It’s the first architecture that attains similar results to familiar convolutional architectures.
- Add BiT + ViT hybrid by @NielsRogge in #20550
Backbones
Breaking a bit the one model per file policy, we introduce backbones (mainly for vision models) which can then be re-used in more complex models like DETR, MaskFormer, Mask2Former etc.
- [NAT, DiNAT] Add backbone class by @NielsRogge in #20654
- Add Swin backbone by @NielsRogge in #20769
- [DETR and friends] Use AutoBackbone as alternative to timm by @NielsRogge in #20833
Bugfixes and improvements
- fix cuda OOM by using single Prior by @ArthurZucker in #20486
- Add ESM contact prediction by @Rocketknight1 in #20535
- flan-t5.mdx: fix link to large model by @szhublox in #20555
- Fix torch device issues by @ydshieh in #20584
- Fix flax GPT-J-6B linking model in tests by @JuanFKurucz in #20556
- [Vision] fix small nit on
BeitDropPath
layers by @younesbelkada in #20587 - Install
natten
with CUDA version by @ydshieh in #20546 - Add entries to
FEATURE_EXTRACTOR_MAPPING_NAMES
by @ydshieh in #20551 - Cleanup some config attributes by @ydshieh in #20554
- [Whisper] Move decoder id method to tokenizer by @sanchit-gandhi in #20589
- Add
require_torch
to 2 pipeline tests by @ydshieh in #20585 - Install
tensorflow_probability
for TF pipeline CI by @ydshieh in #20586 - Ci-whisper-asr by @ArthurZucker in #20588
- cross platform from_pretrained by @ArthurZucker in #20538
- Make convert_to_onnx runable as script again by @mcernusca in #20009
- ESM openfold_utils type hints by @ringohoffman in #20544
- Add RemBERT ONNX config by @hchings in #20520
- Fix link to Swin Model contributor novice03 by @JuanFKurucz in #20557
- Fix link to swin transformers v2 microsoft model by @JuanFKurucz in #20558
- Fix link to table transformer detection microsoft model by @JuanFKurucz in #20560
- clean up unused
classifier_dropout
in config by @ydshieh in #20596 - Fix whisper and speech to text doc by @ArthurZucker in #20595
- Replace
set-output
by$GITHUB_OUTPUT
by @ydshieh in #20547 - [Vision]
.to
function for ImageProcessors by @younesbelkada in #20536 - [Whisper] Fix decoder ids methods by @sanchit-gandhi in #20599
- Add-whisper-conversion by @ArthurZucker in #20600
- README in Hindi 🇮🇳 by @pacman100 in #20097
- Fix code sample in preprocess by @stevhliu in #20561
- Split autoclasses on modality by @stevhliu in #20559
- Fix test for file not found by @sgugger in #20604
- Rework the pipeline tutorial by @Narsil in #20437
- Documentation fixes by @samuelzxu in #20607
- Adding anchor links to Hindi README by @pacman100 in #20606
- exclude jit time from the speed metric calculation of evaluation and prediction by @sywangyi in #20553
- Check if docstring is None before formating it by @xxyzz in #20592
- updating T5 and BART models to support Prefix Tuning by @pacman100 in #20601
- Fix
AutomaticSpeechRecognitionPipelineTests.run_pipeline_test
by @ydshieh in #20597 - Ci-jukebox by @ArthurZucker in #20613
- Update some GH action versions by @ydshieh in #20537
- Fix dtype of weights in from_pretrained when device_map is set by @sgugger in #20602
- add missing is_decoder param by @stevhliu in #20631
- Fix link to speech encoder decoder model in speech recognition readme by @JuanFKurucz in #20633
- Fix
natten
installation in docker file by @ydshieh in #20632 - Clip floating point constants to bf16 range to avoid inf conversion by @aws-sangeetha in #20605
- Pin TensorFlow to the next release by @sgugger in #20635
- [MaskFormer] Add support for ResNet backbone by @NielsRogge in #20483
- [Trainer] add error when passing
8bit
models by @younesbelkada in #20651 - [
ViTHybrid
] + [BiT
] cleaner__init__
by @younesbelkada in #20649 - Update summarization
run_pipeline_test
by @ydshieh in #20623 - pin TF 2.11 in docker files by @ydshieh in #20642
- Speed up git-lfs detection on error by @xloem in #20641
- Updated Trainer args typing by @julianmack in #20655
- Add
dpt-hybrid
support by @younesbelkada in #20645 - [Whisper] Fix forced decoder ids by @sanchit-gandhi in #20652
- Add TFBartForSequenceClassification by @uglyboxer in #20570
- run_speech_recognition_seq2seq.py: add cache_dir param to dataset by @eschmidbauer in #20540
- [
BiT
] Small patch fix by @younesbelkada in #20657 - Fix gpt2 fp16 training when tracing is enabled by @JingyaHuang in #20656
- Fix load from PT-formatted checkpoint in composite TF models by @sgugger in #20661
- Update the list of contributors to reflect current organization by @sgugger in #20603
- Fix expected values for TF-ESM tests by @Rocketknight1 in #20680
- Add
BackboneMixin
by @ydshieh in #20660 - Migrate torchdynamo to torch.compile by @sgugger in #20634
- Whilelist Transformers private method in DummyObject by @sgugger in #20681
- [
ViTHybrid
] Fixaccelerate
slow tests by @younesbelkada in #20679 - Enable bf16 option for XLA devices by @jeffhataws in #20684
- Fix CIs for PyTorch 1.13 by @ydshieh in #20686
- Fix donut image processor by @amyeroberts in #20625
- Added missing
test_tokenization_led
by @IMvision12 in #20568 - Add video classification pipeline by @nateraw in #20151
- [Backbones] Improve out features by @NielsRogge in #20675
- Change transformers.onnx to use optimum.exporters.onnx by @michaelbenayoun in #20529
- skip
test_multi_gpu_data_parallel_forward
forMaskFormerSwinModelTest
by @ydshieh in #20688 - [
ViTHybrid
] fix lastaccelerate
slow test by @younesbelkada in #20705 - Fix rendering issue in quicktour by @sgugger in #20708
- Made LUKE Tokenizer independent from RoBERTa by @salvo96 in #20720
- Spanish translation of asr.mdx and add_new_pipeline.mdx by @alceballosa in #20569
- Add
accelerate
support for LongT5 models by @pszemraj in #20341 - Fix
AutoModelTest.test_model_from_pretrained
by @ydshieh in #20730 - Adding ValueError when imcompatible parameters are used. by @Narsil in #20729
- Add type hints for Whisper models by @donelianc in #20396
- Very small edit to change name to OpenAI GPT by @stanleycai95 in #20722
- fsdp fix by @pacman100 in #20719
- Spanish translation of the file debugging.mdx by @SimplyJuanjo in #20566
- Convert tokenizer outputs for Keras in doc example by @Rocketknight1 in #20732
- Clarify return_tensor and return_text parameters by @stevhliu in #20662
- Add vision requirement to image transforms by @amyeroberts in #20712
- Add a progress bar for large model loading by @sgugger in #20713
- Disambiguate test for required_input in tokenization base file. by @sgugger in #20731
- Add decorator for flaky Donut tests by @amyeroberts in #20739
- rename
layoutlm_job
toexotic_models_job
by @ydshieh in #20736 - Update CI to torch 1.13.0 by @ydshieh in #20687
- Add
keep_in_fp32_modules
support by @younesbelkada in #20683 - Change a logic in pipeline test regarding TF by @ydshieh in #20710
- Fix AdamWeightDecay for TF 2.11 by @Rocketknight1 in #20735
- in the resize() function in image_transforms.py, the line 267: by @dhansmair in #20728
- Add docs xlm roberta by @hazrulakmal in #20742
- Fixing the pipeline tutorial test by @Narsil in #20746
- Uninstall
torch_tensorrt
inDeepSpeed
CI image for now by @ydshieh in #20758 - Remove image_transforms functions from init by @amyeroberts in #20704
- Fix missing
()
in some usage ofis_flaky
by @ydshieh in #20749 - [Tests] Improve test_attention_outputs by @NielsRogge in #20701
- Fix attribute error problem by @casuallyName in #20765
- [CI-Test] Fixes but also skips the mT5 tests by @ArthurZucker in #20755
- Replaces xxx_required with requires_backends by @amyeroberts in #20715
- Install
torch-tensorrt 1.3.0
for DeepSpeed CI by @ydshieh in #20764 - Even more validation. by @Narsil in #20762
- Install vision for TF pipeline tests by @ydshieh in #20771
- Patch for FlanT5-XXL 8bit support by @larsmennen in #20760
- [Pipeline] fix failing bloom
pipeline
test by @younesbelkada in #20778 - Fixing object detection with
layoutlm
by @Narsil in #20776 - Install video dependency for pipeline CI by @ydshieh in #20777
- Move convert_to_rgb to image_transforms module by @amyeroberts in #20784
- Recompile
apex
inDeepSpeed
CI image by @ydshieh in #20788 - [Pipeline] skip feature extraction test if in
IMAGE_PROCESSOR_MAPPING
by @younesbelkada in #20790 - Fix object detection2 by @Narsil in #20798
- Stop calling expand_1d on newer TF versions by @Rocketknight1 in #20786
- Add Universal Segmentation class + mapping by @NielsRogge in #20766
- Install
sentencepiece
inDeepSpeed
CI image by @ydshieh in #20795 - lazy import torch._softmax_backward_data for better compatibility by @daquexian in #20796
- [
Vision
] [Refactor] Initialize weights on the correct place by @younesbelkada in #20803 - Vilt - use image_transforms pad by @amyeroberts in #20780
- [clip] fix error message by @stas00 in #20818
- Add model resources for ViT by @stanleycai95 in #20723
- fix typo output not ouput in bitsandbytes trainer test by @Thomas-MMJ in #20839
- Fix tiny typo by @fzyzcjy in #20841
- Remove unused
max_position_embeddings
in config classes by @ydshieh in #20836 - [mBART] fix erroneous italics in docstring by @sanchit-gandhi in #20835
- TF AdamWeightDecay fix for 2.11 by @Rocketknight1 in #20848
- remove unused
use_cache
in config classes by @ydshieh in #20844 - [SegFormer] Add support for segmentation masks with one label by @NielsRogge in #20279
- Clarify
use_fast
parameter in docstring by @stevhliu in #20840 - [S2T, Whisper] Add copied from statements by @sanchit-gandhi in #20787
- Embed circle packing chart for model summary by @stevhliu in #20791
- [Swin2SR] Add doc tests by @NielsRogge in #20829
- [Examples] Update big table by @NielsRogge in #20845
- Use
config.num_channels
in CLIP-like modeling files by @ydshieh in #20857 - fix past_key_values in GPTNeoXForCausalLM.prepare_inputs_for_generation by @ValeKnappich in #20621
- Add visual prompt to processor of CLIPSeg model by @idilsulo in #20816
- Adding
evaluate
to the list of libraries required in generated notebooks by @MKhalusova in #20850 - Fix past CI by skipping
LevitModelTest.test_problem_types
by @ydshieh in #20859 - Fix whisper export by @mht-sharma in #20800
- Fix doctest by @ArthurZucker in #20843
- Add-warning-tokenizer by @ArthurZucker in #20826
- Update
HubertModelIntegrationTest.test_inference_keyword_spotting
by @ydshieh in #20863 - Generate: post-generate config doctest fix by @gante in #20804
- change strings to f-strings in image_processing_utils.py by @dhansmair in #20865
- [
FSMT
] Make it compatible withxxxForConditionalGeneration
models by @younesbelkada in #20825 - [
MobileNet-v2
] Fix ONNX typo by @younesbelkada in #20860 - having new model entries in Hindi for Hindi README by @pacman100 in #20869
- Add Onnx Config for PoolFormer by @BakingBrains in #20868
- Adding support for
fp16
for asr pipeline. by @Narsil in #20864 - Add script to convert T5X T5 (v1.0 and v1.1) checkpoints to PyTorch by @bastings in #20801
- Add japanese translation of template by @younesbelkada in #20870
- [RobertaPreLayernom] Fixes the CI daily test by @ArthurZucker in #20886
- Fixes typo in the help text for --max_length by @makrai in #20883
- typo fix by @nathan-barry in #20891
- [
T5
] fix fp16 loading issue by @younesbelkada in #20878 - Update flan-t5 original model link by @kamalkraj in #20897
- fix docs typos in "add_new_model" by @elisim in #20900
- [Past CI] 🔥 Leave Past CI failures in the past 🔥 by @ydshieh in #20861
- Avoid collisions in writing metrics via 2 APIs - azureml + mlflow by @akshaya-a in #20837
- Generate: correctly detect default max length by @gante in #20911
- Remove non-breaking spaces by @aphedges in #20929
- Load the state dict on CPU to prevent unnecessary GPU memory surge by @HarshTrivedi in #20920
- Fix FP16 inference in TextGenerationPipeline by @bofenghuang in #20913
- Remove Bert tokenizer dependency from DistillBert (slow/fast) tokenizers by @IvanLauLinTiong in #20933
- Adds type checking to PreTrainedConfig. by @mmcdermott in #20926
- Fix error message in
WhisperFeatureExtractor
by @bofenghuang in #20936 - Fixing DistilBert error message by @samuelzxu in #20945
- [trainer:
distributed_concat
] ensureall_gather
's inputs are contiguous by @stas00 in #20951 - Add generate kwargs to
AutomaticSpeechRecognitionPipeline
by @bofenghuang in #20952 - update pyknp to rhoknp by @conan1024hao in #20890
- Generate: TF XLA beam sample by @gante in #20927
- Fix T5 docstring by @IvanLauLinTiong in #20957
- Generate: delete unused TF
_reorder_cache
by @gante in #20964 MinNewTokensLengthLogitsProcessor
for.generate
method #20814 by @kotikkonstantin in #20892- Fix post_process_object_detection method descriptions by @alaradirik in #20977
- Remove more unused attributes in config classes by @ydshieh in #20858
- [run_clm example] add torch_dtype option for model load. by @sywangyi in #20971
- Fix valid ratio for Deformable Detr by @long8v in #20958
- Enable
decoder_attention_mask
ingenerate
function by @samuelpullely in #20726 - Ignore errors when deleting old checkpoints in trainer by @akrogager in #20984
- Avoid CI runs under users' own CircleCI personal account by @ydshieh in #20981
- Fix for LXMERT by @ydshieh in #20986
- Improve OWL-ViT postprocessing by @alaradirik in #20980
- Fix race condition on cleaning checkpoints when save_total_limit set to 1 by @radcheb in #20989
- Add custom stop token ids for generation by @tokestermw in #20727
- update template by @ArthurZucker in #20885
- Add: doc page for the object detection task by @MKhalusova in #20925
- auxiliary_loss works for Deformable Detr by @long8v in #20959
- Update image processor parameters if creating with kwargs by @amyeroberts in #20866
- Fix bug in segmentation postprocessing by @alaradirik in #20198
- Don't call deprecated method by @amyeroberts in #20904
- Fix model hub link by @idilsulo in #20998
- Refactor the function get_results by @milyiyo in #20999
- Update bug report template by @stevhliu in #21004
- Remove T5 dependency from mT5 model by @SD-13 in #20949
- Update PR template by @stevhliu in #21006
- add: task guide on video classification model fine-tuning. by @sayakpaul in #20827
- Generate: Fix CI related to #20727 by @gante in #21003
- Fix (DeepSpeed) docker image build issue by @ydshieh in #21002
- Fix callback docstrings by @stevhliu in #21005
- Generate: post-generate config TF doctest fix by @gante in #21018
- Generate: FLAX infers pad token in its absence and has functional example by @gante in #21009
- [
BLIP
] Fix daily CI failing test by @younesbelkada in #20877 - Make sure dynamic objects can be saved and reloaded by @sgugger in #21008
- [CLIPSeg] Fix integration test by @NielsRogge in #20995
- Added mask_time_prob and mask_time_length arguments to wav2vec2 pretraining script by @mpierrau in #20985
- [NumPy] Remove references to deprecated NumPy type aliases by @hvaara in #21022
- Fix arguments passed to predict function in QA Seq2seq training script by @Observer46 in #21026
- Support turning off the model uploading in ClearML by @david1542 in #20969
- fix parameter name in docstring by @cceyda in #21032
- fix levit timm conversion file by @Bearnardd in #20938
- fix typo by @kaisugi in #21042
- fix typo by @sabaul in #21048
- Replace
past
withpast_key_values
by @ArthurZucker in #20944 - Fix warning for MCTC model by @sgugger in #21049
- remove flax file from
documentation_tests.txt
by @ydshieh in #21036 - Patch-past-refactor by @ArthurZucker in #21050
- Make the attention_head_size in distilbert an object attribute by @KarlFelixJoehnk in #20970
- feature: update wandb callback to upload checkpoints by @parambharat in #21035
- Fix header level by @stevhliu in #21072
- Update docstring for CLIPConfig by @yingzha in #21066
- fix typo in comment by @soulseen in #21088
- Optimize inference only mode memory if ipex is used by @sywangyi in #21083
- Fixed issue #21039 by @susnato in #21062
- Remove more unused attributes in config classes by @ydshieh in #21000
- [bnb optim] fixing test by @stas00 in #21030
- Fix past CI by @ydshieh in #20967
- Fix
torchscript
tests forAltCLIP
by @ydshieh in #21102 - [Tokenizers] Fix a small typo by @ArthurZucker in #21104
- Update task summary part 1 by @stevhliu in #21014
- Add Spanish translation to community.mdx by @shogohida in #21055
- Rework automatic code samples in docstrings by @sgugger in #20757
- [CI-doc-daily] Remove RobertaPreLayernorm random tests by @ArthurZucker in #20992
- Use raw string for regex in tokenization_t5_fast.py by @odashi in #21125
- Fixed typo in docstring by @tkburis in #21115
- [VideoMAE] Fix docstring by @NielsRogge in #21111
- [LongT5] Remove duplicate encoder_attention_mask default value check by @guillaume-be in #21124
- Add
min_new_tokens
argument in generate() (implementation based onMinNewTokensLengthLogitsProcessor
) by @silverriver in #21044 - Fixing batching pipelines on single items for ChunkPipeline by @Narsil in #21132
- Fixed issue #21053 by @susnato in #21065
- Fix
RealmModelIntegrationTest.test_inference_open_qa
by @ydshieh in #21136 - Added clefourrier as ref point for graph models in bug reports by @clefourrier in #21139
- Update
TFTapasEmbeddings
by @ydshieh in #21107 - [GIT] Fix training by @NielsRogge in #21133
- Fixes to TF collators by @Rocketknight1 in #21143
- TF: serializable hubert by @gante in #20966
- Generate: TF contrastive search must pop
use_cache
frommodel_kwargs
by @gante in #21149 - Rename test_feature_extraction files by @amyeroberts in #21140
- Small simplification to TopKLogitsWarper by @njhill in #21130
- feat: add standalone guide on XLA support. by @sayakpaul in #21141
- Clarify and add missing typical_p argument docstring. by @shermansiu in #21095
- Fixing offline mode for pipeline (when inferring task). by @Narsil in #21113
- Whisper Timestamp processor and prediction by @ArthurZucker in #20620
- Add batch of resources by @NielsRogge in #20647
- Change variable name to prevent shadowing by @sayakpaul in #21153
- CLI: update hub PR URL by @gante in #21154
- Add resources by @NielsRogge in #20872
- Add: tensorflow example for image classification task guide by @MKhalusova in #21038
- Add: An introductory guide for text generation by @MKhalusova in #21090
- Refactoring of the text generate API docs by @MKhalusova in #21112
- Add Epsilon- and Eta-Sampling by @shermansiu in #21121
- Fixed num_channels!=3 normalization training by @layjain in #20630
- 🌐 [i18n-KO] Translated
installation.mdx
to Korean by @wonhyeongseo in #20948 - Add Japanese translation to multilingual.mdx by @shogohida in #21084
- Make
test_save_pretrained_signatures
slow test by @ydshieh in #21105 blip
support for training by @younesbelkada in #21021- Remove Roberta Dependencies from XLM Roberta Flax and Tensorflow models by @samuelzxu in #21047
- Fix typos in documentation by @jordimas in #21160
- OPT: Fix batched generation with FLAX by @gante in #21150
- Fix git model for generate with beam search. by @PeterL1n in #21071
- fix the issue that the output dict of jit model could not get [:2] by @sywangyi in #21146
- using raw string for regex to search <extra_id> by @pfliu-nlp in #21162
- Fix doctest CI by @ydshieh in #21166
- Adapt repository creation to latest hf_hub by @sgugger in #21158
- Add AWS Neuron torchrun support by @jeffhataws in #20806
- Rewrite a couple of lines in the TF XLA doc by @Rocketknight1 in #21177
- [issues template] update deepspeed owners by @stas00 in #21027
- Fix
Mask2FormerForUniversalSegmentation
by @ydshieh in #21175 - Update year 2020 to 2023 in one file by @ydshieh in #21190
- workaround documentation rendering bug by @hollance in #21189
- Fix device issue in
UperNetModelIntegrationTest
by @ydshieh in #21192 - Updates to computer vision section of the Preprocess doc by @MKhalusova in #21181
- Rename GLPN image processor tests by @amyeroberts in #21194
- Update examples with image processors by @amyeroberts in #21155
- hertz is already per second by @hollance in #21188
- [Whisper] Fix timestamp processor by @ArthurZucker in #21187
- Add hallucination filter by @KMFODA in #18675
- [
CVT
] Fix module initialization issue by @younesbelkada in #21193 - Flax dtype-dependent numerical masking by @gante in #21197
- Add Japanese translation index.mdx by @kambehmw in #21186
- Add disclaimer for necessary fake models by @sgugger in #21178
- Enabling live
automatic-speech-recognition
asr for Whisper. by @Narsil in #21196 - [Whispe] Fix pipeline after timestamp merges by @ArthurZucker in #21198
- Update modeling doc strings FE -> IP by @amyeroberts in #21106
- Generate: documented function to compute the transition scores by @gante in #21191
- deleted references of self.vocab_size and self.type_vocab_size for multiple models [TF implementation] by @susnato in #21164
- Update
huggingface_hub
version by @ydshieh in #21212 - Fix
CONFIG_ARCHIVE_MAP_MAPPING_NAMES
by @ydshieh in #21207 - Fix
GPTJ
doctest by @ydshieh in #21213 - Declare len method in PreTrainedTokenizerBase by @thomasw21 in #21210
- Fix code example in training tutorial by @stevhliu in #21201
- Make
parallelism
for CircleCI jobs work - but keep it1
for now by @ydshieh in #21157 - Fix OneFormer Docstrings by @praeclarumjj3 in #21215
- Fix task summary doctest by @stevhliu in #21200
- Remove all hf-internal-testing checkpoints that can be removed by @sgugger in #21199
- Microphone live inference catching up when inference is too slow (whisper). by @Narsil in #21219
- [
BLIP
] fix docstring forBlipTextxxx
by @younesbelkada in #21224 - Skip failing test for now by @sgugger in #21226
- [
BLIP
] fix doctest by @younesbelkada in #21217 - Generate: precision fix in compute_transition_scores doctests by @gante in #21251
- Extend Script to enable conversion of Encoder Only T5x Models to Pytorch by @ToluClassics in #20907
- Add test_image_processing_common.py by @amyeroberts in #20785
- [GIT] Convert more checkpoints by @NielsRogge in #21245
- Optimize by not computing gradients for parameters set to requires_grad=False by @raghavanone in #21236
- Fix reformer CI by @ydshieh in #21254
- Add Japanese translation installation.mdx by @kambehmw in #21241
- Add scikit-learn dependency to train langage-modeling by @mostafaelhoushi in #21229
- Add missing checkpoint for doctest by @amyeroberts in #21258
- Generate: save generation config with the models'
.save_pretrained()
by @gante in #21264 - Replace reduce_labels with do_reduce_labels by @amyeroberts in #21218
- Update tests: replace feature extractor tests with image processor by @amyeroberts in #20768
- Notebook examples grouping and update by @MKhalusova in #21265
- Add: TensorFlow example for semantic segmentation task guide by @MKhalusova in #21223
- [ci-daily] Fix pipeline tests by @ArthurZucker in #21257
- Add class properties with warnings by @amyeroberts in #21195
- [Whisper] fix all issues with unk token by @ArthurZucker in #21250
- Supported pipeline tasks update by @MKhalusova in #21268
- Models docstring by @sgugger in #21225
- Hotifx remove tuple for git config image processor. by @Narsil in #21278
- Fix MaskFormerImageProcessor.post_process_instance_segmentation by @alaradirik in #21256
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @fcakyon
- [New Model] Add TimeSformer model (#18908)
- @kamalkraj
- @ringohoffman
- ESM openfold_utils type hints (#20544)
- @samuelzxu
- @alceballosa
- Spanish translation of asr.mdx and add_new_pipeline.mdx (#20569)
- @ekgren
- Add gpt-sw3 model to transformers (#20209)
- @AndreasMadsen
- Implement Roberta PreLayerNorm (#20305)
- @IvanLauLinTiong
- @jongjyh
- Add AltCLIP (#20446)
- @SD-13
- Remove T5 dependency from mT5 model (#20949)
- @Bearnardd
- @praeclarumjj3
- @kambehmw