Big model inference
You can now use the big model inference of Accelerate directly in any call to from_pretrained
by specifying device_map="auto"
(or your own device_map
). It will automatically load the model taking advantage of your GPU(s) then offloading what doesn't fit in RAM, or even on the hard drive if you don't have RAM. Your model can then be used normally for inference without anything else to do.
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained(
"bigscience/T0pp", revision="sharded", device_map="auto"
)
BLOOM
The BLOOM model has been proposed with its various versions through the BigScience Workshop. The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code.
- BLOOM by @younesbelkada in #17474
CvT
The Convolutional vision Transformer (CvT) improves the Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.
- Add CvT by @NielsRogge and @AnugunjNaman in #17299
GPT Neo-X
GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile, whose weights are made freely and openly available to the public through a permissive license. GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models.
LayoutLMv3
LayoutLMv3 simplifies LayoutLMv2 by using patch embeddings (as in ViT) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) and word-patch alignment (WPA).
- Add LayoutLMv3 by @NielsRogge in #17060
LeViT
LeViT improves the Vision Transformer (ViT) in performance and efficiency by a few architectural differences such as activation maps with decreasing resolutions in Transformers and the introduction of an attention bias to integrate positional information.
- Adding LeViT Model by Facebook by @AnugunjNaman in #17466
LongT5
LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. It is capable of handling input sequences of a length up to 16,384 tokens.
M-CTC-T
The M-CTC-T model is a 1B-param transformer encoder, with a CTC head over 8065 character labels and a language identification head over 60 language ID labels. It is trained on Common Voice (version 6.1, December 2020 release) and VoxPopuli. After training on Common Voice and VoxPopuli, the model is trained on Common Voice only. The labels are unnormalized character-level transcripts (punctuation and capitalization are not removed). The model takes as input Mel filterbank features from a 16Khz audio signal.
Trajectory Transformer
This Transformer is used for deep reinforcement learning. To use it, you need to create sequences from actions, states and rewards from all previous timesteps. This model will treat all these elements together as one big sequence (a trajectory).
- Add trajectory transformer by @CarlCochet in #17141
Wav2Vec2-Conformer
The Wav2Vec2-Conformer is an updated version of fairseq S2T: Fast Speech-to-Text. It requires more parameters than Wav2Vec2, but also yields an improved word error rate.
- [Wav2Vec2Conformer] Official release by @patrickvonplaten in #17709
- Add Wav2Vec2Conformer by @patrickvonplaten in #16812
TensorFlow implementations
Data2VecVision for semantic segmentation, OPT and Swin are now available in TensorFlow.
- Add TFData2VecVision for semantic segmentation by @sayakpaul in #17271
- Opt in flax and tf by @ArthurZucker in #17388
- Add Tensorflow Swin model by @amyeroberts in #16988
Flax implementations
OPT is now available in Flax.
- Opt in flax and tf by @ArthurZucker in #17388
Documentation translation in Italian and Portuguese
A community effort has been started to translate the documentation in two new languages: Italian and Portuguese.
- Translation/italian: added pipeline_tutorial.mdx [Issue: #17459] by @nickprock in #17507
- Add installation.mdx Italian translation by @mfumanelli in #17530
- Setup for Italian translation and add quicktour.mdx translation by @mfumanelli in #17472
- Adding the Portuguese version of the tasks/token_classification.mdx documentation by @jonatasgrosman in #17492
- Adding the Portuguese version of the tasks/sequence_classification.mdx documentation by @jonatasgrosman in #17352
- [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial by @Fellip15 in #17076
- Added translation of installation.mdx to Portuguese Issue #16824 by @rzimmerdev in #16979
Improvements and bugfixes
- Sort the model doc Toc Alphabetically by @sgugger in #17723
- normalize keys_to_ignore by @stas00 in #17722
- CLI: Add flag to push TF weights directly into main by @gante in #17720
- Update requirements.txt by @jeffra in #17719
- Revert "Change push CI to run on workflow_run event by @ydshieh in #17692)"
- Documentation: RemBERT fixes by @stefan-it in #17641
- Change push CI to run on workflow_run event by @ydshieh in #17692
- fix tolerance for a bloom slow test by @younesbelkada in #17634
- [LongT5] disable model parallel test by @patil-suraj in #17702
- FX function refactor by @michaelbenayoun in #17625
- Add
BloomForSequenceClassification
andBloomForTokenClassification
classes by @haileyschoelkopf in #17639 - Swin main layer by @amyeroberts in #17693
- Include a comment to reflect Amy's contributions by @sayakpaul in #17689
- Rag end2end new by @shamanez in #17650
- [LongT5] Rename checkpoitns by @patrickvonplaten in #17700
- Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference by @jianan-gu in #17153
- Fix doc builder Dockerfile by @ydshieh in #17435
- Add FP16 Support for SageMaker Model Parallel by @haohanchen-yagao in #17386
- enable cpu distribution training using mpirun by @sywangyi in #17570
- Add Ray's scope to training arguments by @BramVanroy in #17629
- Update modeling_gpt_neox.py by @willfrey in #17575
- Fix dtype getter by @sgugger in #17668
- explicitly set utf8 for Windows by @BramVanroy in #17664
- Fixed documentation typo, parameter name is evaluation_strategy, not eval_strategy by @sainttttt in #17669
- Add Visual Question Answering (VQA) pipeline by @sijunhe in #17286
- Fix typo in adding_a_new_model README by @ayushtues in #17679
- Avoid GPU OOM for a TF Rag test by @ydshieh in #17638
- fix typo from emtpy to empty by @domenicrosati in #17643
- [Generation Test] Make fast test actually fast by @patrickvonplaten in #17661
- [Data2Vec] Speed up test by @patrickvonplaten in #17660
- [BigBirdFlaxTests] Make tests slow by @patrickvonplaten in #17658
- update README.md by @loubnabnl in #17657
- 🐛 Properly raise
RepoNotFoundError
when not authenticated by @SBrandeis in #17651 - Fixes #17128 . by @mygithubid1 in #17356
- Fix dtype getters by @sgugger in #17656
- Add skip logic for attentions test - Levit by @amyeroberts in #17633
- Enable crop_center method to handle (W, H, C) images by @alaradirik in #17626
- Move Clip image utils to image_utils.py by @alaradirik in #17628
- Skip tests until bug is fixed. by @sgugger in #17646
- Translation/autoclass by @mfumanelli in #17615
- didn't exist in pt-1.9 by @stas00 in #17644
- convert assertion to raised exception in debertav2 by @sam-h-bean in #17619
- Pre-build DeepSpeed by @ydshieh in #17607
- [modeling_utils] torch_dtype/auto floating dtype fixes by @stas00 in #17614
- Running a pipeline of
float16
. by @Narsil in #17637 - fix use_amp rename after pr 17138 by @stas00 in #17636
- Fix very long job failure text in Slack report by @ydshieh in #17630
- Adding
top_k
argument totext-classification
pipeline. by @Narsil in #17606 - Mention in the doc we drop support for fairscale by @sgugger in #17610
- Use shape_list to safely get shapes for Swin by @amyeroberts in #17591
- Add ONNX support for ConvNeXT by @regisss in #17627
- Add ONNX support for ResNet by @regisss in #17585
- has_attentions - consistent test skipping logic and tf tests by @amyeroberts in #17495
- CLI: Print all different tensors on exception by @gante in #17612
- TF: Merge PT and TF behavior for Bart when no decoder_input_ids are passed by @gante in #17593
- Fix telemetry URL by @sgugger in #17608
- CLI: Properly detect encoder-decoder models by @gante in #17605
- Fix link for community notebooks by @ngoquanghuy99 in #17602
- Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch by @jianan-gu in #17138
- fix
train_new_from_iterator
in the case of byte-level tokenizers by @SaulLu in #17549 - Explicit versions in docker files by @ydshieh in #17586
- CLI: add stricter automatic checks to
pt-to-tf
by @gante in #17588 - fix by @ydshieh in #17589
- quicktour.mdx en -> pt translation by @vitorfrois in #17074
- Fx support for Deberta-v[1-2], Hubert and LXMERT by @michaelbenayoun in #17539
- Add examples telemetry by @sgugger in #17552
- Fix gendered sentence in Spanish translation by @omarespejel in #17558
- Fix circular import in onnx.utils by @sgugger in #17577
- Use latest stable PyTorch/DeepSpeed for Push & Scheduled CI by @ydshieh in #17417
- Remove circular imports in layoutlm/init.py by @regisss in #17576
- Add magic method to our TF models to convert datasets with column inference by @Rocketknight1 in #17160
- [deepspeed / testing] reset global state by @stas00 in #17553
- Remove RuntimeErrors for NaN-checking in 20B by @zphang in #17563
- fix integration test levit by @AnugunjNaman in #17555
- [deepspeed] fix load_best_model test by @stas00 in #17550
- Update index.mdx by @BritneyMuller in #17547
- Clean imports to fix test_fetcher by @sgugger in #17531
- Update run_glue_no_trainer.py by @bofenghuang in #17546
- Fix all offload and MP tests by @sgugger in #17533
- Fix bug - layer names and activation from previous refactor by @amyeroberts in #17524
- Add support for Perceiver ONNX export by @deutschmn in #17213
- Allow from transformers import TypicalLogitsWarper by @teticio in #17477
- Add Gated-SiLU to T5 by @DanielHesslow in #17420
- Update URL for Hub PR docs by @lewtun in #17532
- fix OPT-Flax CI tests by @ArthurZucker in #17512
- [trainer/deepspeed] load_best_model (reimplement re-init) by @stas00 in #17151
- Implemented loss for training AudioFrameClassification by @MorenoLaQuatra in #17513
- Update configuration_auto.py by @kamalkraj in #17527
- Check list of models in the main README and sort it by @sgugger in #17517
- Fix when Accelerate is not installed by @sgugger in #17518
- Clean README in post release job as well. by @sgugger in #17519
- Fix CI tests hang forever by @ydshieh in #17471
- Print more library versions in CI by @ydshieh in #17384
- Split push CI into 2 workflows by @ydshieh in #17369
- Fix Tapas tests by @ydshieh in #17510
- CLI: tool to convert PT into TF weights and open hub PR by @gante in #17497
- Fix flakey no-trainer test by @muellerzr in #17515
- Deal with the error when task is regression by @fireindark707 in #16330
- Fix CTRL tests by @ydshieh in #17508
- Fix LayoutXLMProcessorTest by @ydshieh in #17506
- Debug LukeForMaskedLM by @Ryou0634 in #17499
- Fix MP and CPU offload tests for Funnel and GPT-Neo by @sgugger in #17503
- Exclude Databricks from notebook env by @sgugger in #17496
- Fix
tokenizer
type annotation inpipeline(...)
by @willfrey in #17500 - Refactor classes to inherit from nn.Module instead of nn.Sequential by @amyeroberts in #17493
- Fix wav2vec2 export onnx model with attention_mask error by @nilboy in #16004
- Add warning when using older version of torch for ViltFeatureExtractor by @xhluca in #16756
- Fix typo of variable names for key and query projection layer by @Kyeongpil in #17155
- Fixed wrong error message for missing weight file by @123jimin in #17216
- Add OnnxConfig for SqueezeBert iss17314 by @Ruihua-Fang in #17315
- [GPT2Tokenizer] Fix GPT2 with bos token by @patrickvonplaten in #17498
- [Json configs] Make json prettier for all saved tokenizer files & ensure same json format for all processors (tok + feat_extract) by @patrickvonplaten in #17457
- Accumulate tokens into batches in
PreTrainedTokenizerBase.add_tokens()
by @Witiko in #17119 - Add HF.co for PRs / Issues regarding specific model checkpoints by @patrickvonplaten in #17485
- Fix checkpoint name by @ydshieh in #17484
- Docker image build in parallel by @ydshieh in #17434
- Added XLM onnx config by @nandwalritik in #17030
- Disk offload fix by @sgugger in #17428
- TF: GPT-2 generation supports left-padding by @gante in #17426
- Fix ViTMAEModelTester by @ydshieh in #17470
- [Generate] Fix output scores greedy search by @patrickvonplaten in #17442
- Fix nits by @omarespejel in #17349
- Fx support for multiple model architectures by @michaelbenayoun in #17393
- typo IBERT in repr quant_mode by @scratchmex in #17398
- Fix typo (remove parenthesis) by @mikcnt in #17415
- Improve notrainer examples by @pacman100 in #17449
- [OPT] Fix bos token id default by @patrickvonplaten in #17441
- Fix model parallelism test by @sgugger in #17439
- Pin protobouf that breaks TensorBoard in PyTorch by @sgugger in #17440
- Spanish translation of the file preprocessing.mdx by @yharyarias in #16299
- Spanish translation of the files sagemaker.mdx and image_classification.mdx by @SimplyJuanjo in #17262
- Added es version of bertology.mdx doc by @jQuinRivero in #17255
- Wav2vec2 finetuning shared file system by @patrickvonplaten in #17423
- fix link in performance docs by @lvwerra in #17419
- Add link to Hub PR docs in model cards by @lewtun in #17421
- Upd AutoTokenizer.from_pretrained doc examples by @c00k1ez in #17416
- Support compilation via Torchdynamo, AOT Autograd, NVFuser by @anijain2305 in #17308
- Add test for new model parallelism features by @sgugger in #17401
- Make check_init script more robust and clean inits by @sgugger in #17408
- Fix README localizer script by @sgugger in #17407
- Fix expected value for OPT test
test_inference_no_head
by @ydshieh in #17395 - Clean up CLIP tests by @NielsRogge in #17380
- Enabling
imageGPT
auto feature extractor. by @Narsil in #16871 - Add support for
device_map="auto"
to OPT by @sgugger in #17382 - OPTForCausalLM lm_head input size should be config.word_embed_proj_dim by @vfbd in #17225
- Traced models serialization and torchscripting fix by @michaelbenayoun in #17206
- Fix Comet ML integration by @mxschmdt in #17381
- Fix cvt docstrings by @AnugunjNaman in #17367
- Correct & Improve Doctests for LayoutLMv2 by @gnolai in #17168
- Fix CodeParrot training script by @loubnabnl in #17291
- Fix a typo relative_postion_if_large -> relative_position_if_large by @stancld in #17366
- Pin dill to fix examples by @sgugger in #17368
- [Test OPT] Add batch generation test opt by @patrickvonplaten in #17359
- Fix bug in Wav2Vec2 pretrain example by @ddobokki in #17326
- fix for 17292 by @nadahlberg in #17293
- [Generation] Fix Transition probs by @patrickvonplaten in #17311
- [OPT] Run test in lower precision on GPU by @patrickvonplaten in #17353
- Adding
batch_size
test to QA pipeline. by @Narsil in #17330 - [BC] Fixing usage of text pairs by @Narsil in #17324
- [tests] fix copy-n-paste error by @stas00 in #17312
- Fix ci_url might be None by @ydshieh in #17332
- fix by @ydshieh in #17337
- Fix metric calculation in examples and setup tests to run on multi-gpu for no_trainer scripts by @muellerzr in #17331
- docs for typical decoding by @jadermcs in #17186
- Not send successful report by @ydshieh in #17329
- Fix test_t5_decoder_model_past_large_inputs by @ydshieh in #17320
- Add onnx export cuda support by @JingyaHuang in #17183
- Add Information Gain Filtration algorithm by @mraunak in #16953
- Fix typo by @kamalkraj in #17328
- remove by @ydshieh in #17325
- Accepting real pytorch device as arguments. by @Narsil in #17318
- Updating the docs for
max_seq_len
in QA pipeline by @Narsil in #17316 - [T5] Fix init in TF and Flax for pretraining by @patrickvonplaten in #17294
- Add type hints for ProphetNet (Pytorch) by @jQuinRivero in #17223
- fix by @patrickvonplaten in #17310
- [LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing by @caesar-one in #17112
- Add support for pretraining recurring span selection to Splinter by @jvcop in #17247
- Add PR author in CI report + merged by info by @ydshieh in #17298
- Fix dummy creation script by @sgugger in #17304
- Doctest longformer by @KMFODA in #16441
- [Test] Fix W2V-Conformer integration test by @patrickvonplaten in #17303
- Improve mismatched sizes management when loading a pretrained model by @regisss in #17257
- correct opt by @patrickvonplaten in #17301
- Rewrite TensorFlow train_step and test_step by @Rocketknight1 in #17057
- Fix tests of mixed precision now that experimental is deprecated by @Rocketknight1 in #17300
- fix retribert's
test_torch_encode_plus_sent_to_model
by @SaulLu in #17231 - [ConvNeXT] Fix drop_path_rate by @NielsRogge in #17280
- Fix wrong PT/TF categories in CI report by @ydshieh in #17272
- Fix missing job action button in CI report by @ydshieh in #17270
- Fix test_model_parallelization by @lkm2835 in #17249
- [Tests] Fix slow opt tests by @patrickvonplaten in #17282
- docs(transformers): fix typo by @k-zehnder in #17263
- logging documentation update by @sanderland in #17174
- Use the PR URL in CI report by @ydshieh in #17269
- Fix FlavaForPreTrainingIntegrationTest CI test by @ydshieh in #17232
- Better error in the Auto API when a dep is missing by @sgugger in #17289
- Make TrainerHyperParameterSigOptIntegrationTest slow test by @ydshieh in #17288
- Automatically sort auto mappings by @sgugger in #17250
- Mlflowcallback fix nonetype error by @orieg in #17171
- Align logits and labels in OPT by @MichelBartels in #17237
- Remove next sentence prediction from supported ONNX tasks by @lewtun in #17276
- CodeParrot data pretokenization by @loubnabnl in #16932
- Update codeparrot data preprocessing by @loubnabnl in #16944
- Updated checkpoint support for Sagemaker Model Parallel by @cavdard in #17219
- fixed bug in run_mlm_flax_stream.py by @KennethEnevoldsen in #17203
- [doc] performance/scalability revamp by @stas00 in #15723
- TF - Fix convnext classification example by @gante in #17261
- Fix obvious typos in flax decoder impl by @cloudhan in #17279
- Guide to create custom models in Spanish by @ignacioct in #17158
- Translated version of model_sharing.mdx doc to spanish by @Gerard-170 in #16184
- Add PR title to push CI report by @ydshieh in #17246
- Fix push CI channel by @ydshieh in #17242
- install dev. version of accelerate by @ydshieh in #17243
- Fix Trainer for Datasets that don't have dict items by @sgugger in #17239
- Handle copyright in add-new-model-like by @sgugger in #17218
- fix --gpus option for docker by @ydshieh in #17235
- Update self-push workflow by @ydshieh in #17177
- OPT - fix docstring and improve tests slighly by @patrickvonplaten in #17228
- OPT-fix by @younesbelkada in #17229
- Fix typo in bug report template by @fxmarty in #17178
- Black preview by @sgugger in #17217
- update BART docs by @patil-suraj in #17212
- Add test to ensure models can take int64 inputs by @Rocketknight1 in #17210
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @sayakpaul
- @jianan-gu
- @stancld
- @mfumanelli
- @cwkeam
- M-CTC-T Model (#16402)
- @zphang
- @AnugunjNaman
- @yharyarias
- Spanish translation of the file preprocessing.mdx (#16299)
- @mraunak
- Add Information Gain Filtration algorithm (#16953)
- @rzimmerdev