github huggingface/transformers v4.28.0
v4.28.0: LLaMa, Pix2Struct, MatCha, DePlot, MEGA, NLLB-MoE, GPTBigCode

latest releases: v4.45.1, v4.45.0, v4.44.2...
17 months ago

LLaMA

The LLaMA model was proposed in LLaMA: Open and Efficient Foundation Language Models. It is a collection of foundation language models ranging from 7B to 65B parameters. You can request access to the weights here then use the conversion script to generate a checkpoint compatible with Hugging Face

Pix2Struct, MatCha, DePlot

Pix2Struct is a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language. Pix2Struct has been fine-tuned on various tasks and datasets, ranging from image captioning and visual question answering (VQA) over different inputs (books, charts, science diagrams) to captioning UI components, and others.

Mega

MEGA proposes a new approach to self-attention with each encoder layer having a multi-headed exponential moving average in addition to a single head of standard dot-product attention, giving the attention mechanism stronger positional biases. This allows MEGA to perform competitively to Transformers on standard benchmarks including LRA while also having significantly fewer parameters. MEGA’s compute efficiency allows it to scale to very long sequences, making it an attractive option for long-document NLP tasks.

GPTBigCode

The model is a an optimized GPT2 model with support for Multi-Query Attention.

  • Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) by @jlamypoirier in #22575

NLLB-MoE

The mixture of experts version of the NLLB release has been added to the library.

Serializing 8bit models

You can now push 8bit models and/or load 8bit models directly from the Hub, save memory and load your 8bit models faster! An example repo here

Breaking Changes

Ordering of height and width for the BLIP image processor

Notes from the PR:

The BLIP image processor incorrectly passed in the dimensions to resize in the order (width, height). This is reordered to be correct.

In most cases, this won't have an effect as the default height and width are the same. However, this is not backwards compatible for custom configurations with different height, width settings and direct calls to the resize method with different height, width values.

  • 🚨🚨🚨 Fix ordering of height, width for BLIP image processor by @amyeroberts in #22466

Prefix tokens for the NLLB tokenizer

The big problem was the prefix and suffix tokens of the NLLB tokenizer.

Previous behaviour:

>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[13374, 1398, 4260, 4039, 248130, 2, 256047]
>>> # 2: '</s>'
>>> # 256047 : 'eng_Latn'

New behaviour

>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")
>>> tokenizer("How was your day?").input_ids
[256047, 13374, 1398, 4260, 4039, 248130, 2]

In case you have pipelines that were relying on the old behavior, here is how you would enable it once again:

>>> from transformers import NllbTokenizer
>>> tokenizer = NllbTokenizer.from_pretrained("facebook/nllb-200-distilled-600M", legacy_behaviour = True)
  • 🚨🚨🚨 [NLLB Tokenizer] Fix the prefix tokens 🚨🚨🚨 by @ArthurZucker in #22313

TensorFlow ports

The BLIP model is now available in TensorFlow.

Export TF Generate with a TF tokenizer

As the title says, this PR adds the possibility to export TF generate with a TF-native tokenizer -- the full thing in a single TF graph.

  • Generate: Export TF generate with a TF tokenizer by @gante in #22310

Task guides

A new task guide has been added, focusing on depth-estimation.

Bugfixes and improvements

  • Load optimizer state on CPU to avoid CUDA OOM by @sgugger in #22159
  • Run all tests by default by @sgugger in #22162
  • Fix: unfinished_sequences with correct device by @Stxr in #22184
  • Revert 22152 MaskedImageCompletionOutput changes by @amyeroberts in #22187
  • Regression pipeline device by @sgugger in #22190
  • Update BridgeTowerForContrastiveLearning by @abhiwand in #22145
  • t5 remove data dependency by @prathikr in #22097
  • Fix DeepSpeed CI by @ydshieh in #22194
  • Fix typo in Align docs by @alaradirik in #22199
  • Update expected values in MgpstrModelIntegrationTest by @ydshieh in #22195
  • Italian Translation of migration.mdx by @Baelish03 in #22183
  • Update tiny model creation script by @ydshieh in #22202
  • Temporarily fix ONNX model exporting error by @SatyaJandhyalaAtMS in #21830
  • [XGLM] Add accelerate support for XGLM by @younesbelkada in #22207
  • fixes a typo in WhisperFeatureExtractor docs. by @susnato in #22208
  • Hotfix for natten issue with torch 2.0.0 on CircleCI by @ydshieh in #22218
  • fix typos in llama.mdx by @keturn in #22223
  • fix code example in mgp-str doc by @wdp-007 in #22219
  • Use dash==2.8.1 for now for daily CI by @ydshieh in #22227
  • LLaMA house-keeping by @sgugger in #22216
  • fix AutoTP in deepspeed could not work for bloom by @sywangyi in #22196
  • Add LlamaForSequenceClassification by @lewtun in #22209
  • Removed .mdx extension in two links by @MKhalusova in #22230
  • fix(docs): fix task guide links in model docs by @Seb0 in #22226
  • Fix natten by @alihassanijr in #22229
  • Revert "Use dash==2.8.1 for now for daily CI" by @ydshieh in #22233
  • Fix Unnecessary move of tensors from CPU to GPU in LlamaRotaryEmbedding by @ma787639046 in #22234
  • [trainer] param count for deepspeed zero3 by @stas00 in #22193
  • Update training_args.py -- a nightly install is not required anymore for torch.compile by @pminervini in #22266
  • [Docs] fix typos in some tokenizer docs by @yesinkim in #22256
  • Italian translation perf_infer_cpu by @nickprock in #22243
  • [Trainer] Add optional communication backends for torch.distributed when using GPU by @heya5 in #22247
  • Fix the gradient checkpointing bug of the llama model by @yqy2001 in #22270
  • Fix balanced and auto device_map by @sgugger in #22271
  • Rework a bit the LLaMA conversion script by @sgugger in #22236
  • Proper map location for optimizer load by @sgugger in #22273
  • Fix doc links by @amyeroberts in #22274
  • Move torch.compile() wrapping after DDP/FSDP wrapping to ensure correct graph breaks during training by @ani300 in #22279
  • Example of pad_to_multiple_of for padding and truncation guide & docstring update by @MKhalusova in #22278
  • Update vision docstring bool masked pos by @amyeroberts in #22237
  • replace_8bit_linear modules_to_not_convert default value fix by @BlackSamorez in #22238
  • Fix error in mixed precision training of TFCvtModel by @gcuder in #22267
  • More doctests by @ydshieh in #22268
  • fix more doctests by @ydshieh in #22292
  • Add translation perf_infer_gpu_one for it by @davidegazze in #22296
  • Restore fp16 support on xla gpu device by @ymwangg in #22300
  • Correct NATTEN function signatures and force new version by @alihassanijr in #22298
  • [deepspeed] offload + non-cpuadam optimizer exception doc by @stas00 in #22044
  • Final update of doctest by @ydshieh in #22299
  • Add MaskedImageModelingOutput by @alaradirik in #22212
  • Enable traced model for text-generation task by @jiqing-feng in #22265
  • add low_cpu_mem_usage option in run_clm.py example which will benefit… by @sywangyi in #22288
  • fix: Allow only test_file in pytorch and flax summarization by @connor-henderson in #22293
  • Fix position embeddings for GPT-J and CodeGen by @njhill in #22069
  • Fixed bug to calculate correct xpath_sub_list in MarkupLMTokenizer by @silentghoul-spec in #22302
  • Enforce max_memory for device_map strategies by @sgugger in #22311
  • Beef up Llama tests by @gante in #22314
  • docs: Resolve incorrect type typo in trainer methods by @tomaarsen in #22316
  • Chunkable token classification pipeline by @luccailliau in #21771
  • Fix PipelineTests skip conditions by @ydshieh in #22320
  • [deepspeed zero3] need generate(synced_gpus=True, ...) by @stas00 in #22242
  • [gptj] support older pytorch version by @stas00 in #22325
  • Move common properties to BackboneMixin by @amyeroberts in #21855
  • Backbone add mixin tests by @amyeroberts in #22542
  • Backbone add out indices by @amyeroberts in #22493
  • [MBart] Add accelerate support for MBart by @younesbelkada in #22309
  • Fixed gradient checkpoint bug for TimeSeriesTransformer by @mollerup23 in #22272
  • Mention why one needs to specify max_steps in Trainer by @lhoestq in #22333
  • Fix various imports by @sgugger in #22281
  • Minor typo in pipeline FillMaskPipeline's documentation. by @SamuelLarkin in #22339
  • Added type hints to TFDeiTModel by @Batese2001 in #22327
  • Fix --bf16 option support for Neuron after PR #22300 by @jeffhataws in #22307
  • Generate: add test for left-padding support by @gante in #22322
  • Enable training Llama with model or pipeline parallelism by @kooshi in #22329
  • Automatically create/update tiny models by @ydshieh in #22275
  • [HFTracer] Make embeddings ops take on the dtype of the weight by @jamesr66a in #22347
  • Fix typo in Greedy Search Description by @awinml in #22345
  • Generate: Add GPTNeoX integration test by @gante in #22346
  • Update docker files to use official torch 2.0.0 by @ydshieh in #22357
  • Pin tensorflow-text to go with tensorflow by @sgugger in #22362
  • Improve error message by @Mahrkeenerh in #22361
  • TensorFlow: pin maximum version to 2.12 by @gante in #22364
  • Resnet flax by @Shubhamai in #21472
  • [Trainer] add disclaimer that full_determinism is slow by @stas00 in #22368
  • [safetensors] don't use in torch<1.10 by @stas00 in #22370
  • TensorFlow: additional missing cmake dependencies in CI by @gante in #22383
  • Changed world_size() to get_world_size() bugfix by @Charlie-Bell in #22381
  • Translated documentation in italian by @nickprock in #22388
  • Adapt find_tied_parameters to handle breaking change in Accelerate by @sgugger in #22360
  • load_in_8bit now respects 'balanced' device maps in multi-gpu environments by @kooshi in #22377
  • Wav2Vec2ProcessorWithLM can return N best hypotheses now by @vsokolovskii in #22235
  • Seq2seq trainer generation config arg by @Natooz in #22323
  • Generate: support for left-padding on GPTNeoX and Llama by @gante in #22382
  • [bnb] Force requires_grad to be False by @younesbelkada in #22396
  • Transformers env safetensors by @sgugger in #22400
  • [Pix2Struct] Add support to resize embeddings by @NielsRogge in #22394
  • Trainer: move Seq2SeqTrainer imports under the typing guard by @gante in #22401
  • Trainer: missing None check by @gante in #22404
  • Hardware Auto-Setup for Examples by @dongreenberg in #22319
  • [neptune] fix checkpoint bug with relative out_dir by @kshitij12345 in #22102
  • Fix bug in perplexity guide calculations and update perplexity numbers. Fixes #22348 by @fpgaminer in #22411
  • [performance] ensure causal_mask is created directly on device by @jeffra in #22378
  • MBart: Fix docs and doctests by @gante in #22422
  • Add clean_up_tokenization_spaces to config by @ArthurZucker in #22341
  • Hyperparameter search reporting to W&B by @NoB0 in #22440
  • [bnb] fix bnb failing test by @younesbelkada in #22439
  • [Generate] Add conditional generation for multimodal models by @younesbelkada in #22424
  • Don't hard error when cache version can't be converted to int by @sgugger in #22427
  • Use real tokenizers if tiny version(s) creation has issue(s) by @ydshieh in #22428
  • Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head))" by @sgugger in #22444
  • [Pix2Struct] Fix slow test by @younesbelkada in #22448
  • Revert "Fix --bf16 option support for Neuron after PR #22300" by @jeffhataws in #22451
  • Update Neptune docs by @normandy7 in #22452
  • Avoid using personal HF token in CI by @ydshieh in #22453
  • Update release instructions by @sgugger in #22454
  • Pin ruff by @sgugger in #22455
  • Update: ignore padding support for TransfoXL training when n_clusters==0 by @StefanHeng in #22457
  • Rescale image back if it was scaled during PIL conversion by @amyeroberts in #22458
  • Skip flaky NLLB Moe test for now by @amyeroberts in #22463
  • Guard imports of PreTrainedTokenizerFast on is_tokenizers_available by @hvaara in #22285
  • [NLLB-MoE] model_type update for auto mapping by @ArthurZucker in #22470
  • Llama: support for max_position_embeddings by @gante in #22471
  • Docs fix: Multinomial sampling decoding needs "num_beams=1", since by default it is usually not 1. by @manueldeprada in #22473
  • (Re-)Enable Nightly + Past CI by @ydshieh in #22393
  • Relax eos_token_id < 0 checks in generate() from ValueError to warning by @lewtun in #22472
  • Update Wav2Vec2ProcessorWithLM doc example by @ydshieh in #22474
  • Making sure we can use safetensors to serialize all the time. by @Narsil in #22437
  • Update Neptune callback docstring by @normandy7 in #22497
  • Test fetch v2 by @sgugger in #22367
  • Update convert_llama_weights_to_hf.py by @Ricardokevins in #22525
  • [Time-Series] fix past_observed_mask type by @elisim in #22076
  • Fix llama tokenizer by @ArthurZucker in #22402
  • [WIP] docs: ko: sagemaker.mdx by @jungnerd in #22509
  • added biogpt token classifier by @upjabir in #22447
  • Generate: TextIteratorStreamer (streamer for gradio) by @gante in #22501
  • Fix convert_opt_original_pytorch_checkpoint_to_pytorch.py typo by @larekrow in #22526
  • llama docs: fix conversion script url by @python273 in #22514
  • fix LayoutLMv3TokenizerFast subword label after 'Δ ' token by @thibaultdouzon in #21695
  • [BLIP] fix cross attentions for BlipTextEncoder by @zhbh01 in #22515
  • [Trainer] Force is_model_parallel when model is loaded in multiple GPUs using accelerate by @younesbelkada in #22532
  • [T5] Enable naive Pipeline Parallelism training for T5 by @younesbelkada in #22535
  • Fix missing metrics with multiple eval datasets by @hawkeoni in #22536
  • [setup] drop deprecated distutils usage by @XuehaiPan in #22531
  • Generate: Enable easier TextStreamer customization by @vblagoje in #22516
  • [setup] migrate setup script to pyproject.toml by @XuehaiPan in #22539
  • Update test_image_processing_pix2struct.py by @younesbelkada in #22543
  • Fix OPTForQuestionAnswering doc string by @curlup in #22481
  • Generate: Add text streamer decoding options by @gante in #22544
  • πŸ”₯py38 + torch 2 πŸ”₯πŸ”₯πŸ”₯πŸš€ by @ydshieh in #22204
  • Time to Say Goodbye, torch 1.7 and 1.8 by @ydshieh in #22291
  • [Roformer] Fixing a bug in RoFormerEncoder where it was ignoring the length of past_key_values when generating as a decoder by @TheWall9 in #22416
  • Implemented safetensors checkpoints save/load for Trainer by @ViktorooReps in #22498
  • Remove hack for dynamic modules and use Python functions instead by @sgugger in #22537
  • [bnb] Fix typo by @younesbelkada in #22556
  • Add id2label and label2id to model's config in run_xnil by @maziyarpanahi in #22558
  • Soft error whisper. by @Narsil in #22475
  • corrected the code comment for the output of find_pruneable_heads_and_indices by @SunHaozhe in #22557
  • Flax Regnet by @Shubhamai in #21867
  • fix _no_split_modules for Whisper model by @pacman100 in #22486
  • Fix inverted conditional in TF common test! by @Rocketknight1 in #22540
  • Generate: TextIteratorStreamer timeout by @gante in #22576
  • Move back doctest instructions to setup.cfg by @sgugger in #22587
  • Tests: disable accelerate_tests mark warnings by @gante in #22585
  • Fix PT-TF equivalence test for GPT1 by @Rocketknight1 in #22586
  • Add thousands separator in training summary by @qmeeus in #22583
  • docs: ko: complete _toctree.yml by @wonhyeongseo in #22581
  • Sync preprocesses before loading the processor at run_speech_recognition_ctc.py by @mpenagar in #21926
  • Fix a typo in one of the BLIP pretrained checkpoint names by @Rocketknight1 in #22588
  • Adding support for BPE merge creation from scores instead of ids. by @Narsil in #22582
  • Use native TF checkpoints for the BLIP TF tests by @Rocketknight1 in #22593
  • feat(model parallelism): moving the labels to the same device as the logits for gpt2 and bart by @kaustubh-s1 in #22591
  • Adding Llama FastTokenizer support. by @Narsil in #22264
  • Revert error back into warning for byte fallback conversion. by @Narsil in #22607
  • Seq2SeqTrainer: use unwrapped model to retrieve the generation config by @gante in #22584
  • Make tiny model creation + pipeline testing more robust by @ydshieh in #22500
  • docs: Fix broken link to generation strategies by @connor-henderson in #22623
  • update_pip_test_mapping by @ydshieh in #22606
  • A script to add/update pipeline_model_mapping systematically by @ydshieh in #22180
  • [bnb] 8bit models should not be converted to DDP by @younesbelkada in #22628
  • LlamaTokenizerFast Fix (.., from_slow=True). by @Narsil in #22630
  • [Blip] Fix slow tests and doctests with correct values by @younesbelkada in #22632
  • Update tiny model summary file for recent models by @ydshieh in #22637
  • fix FSDP version related issues by @pacman100 in #22489
  • 🌐[i18n-KO] Translate autoclass_tutorial to Korean and Fix the typo of quicktour by @gabrielwithappy in #22533
  • Move labels to the same device as logits for LlamaForSequenceClassification and Blip2 by @xssChauhan in #22596
  • Fix typo by @Ronalmoo in #22650
  • Fix MegaModel CI by @ydshieh in #22652
  • 🌐 [i18n-KO] Translated pipeline_tutorial.mdx to Korean by @wonhyeongseo in #22508
  • Small nit, by @ArthurZucker in #22653
  • [tokenization] do not push special file by @ArthurZucker in #22657
  • [OPT] Fix default attention mask size by @ArthurZucker in #22649
  • Generate: add API warning to streamers by @gante in #22659
  • Revert migration of setup to pyproject.toml by @sgugger in #22658
  • moved labels to the same device as logits for BLOOM, GPT Neo, GPT NeoX, RoBERTa and VIT models by @iamarunbrahma in #22663
  • Model parallelism: Moving labels to the same device as logits for BridgeTower models by @shahad-mahmud in #22676
  • (feat): Moving labels to same device as logits for Deit by @xssChauhan in #22679
  • Make dynamic code work with offline mode by @sgugger in #22661
  • Fix quantization docs typo by @python273 in #22666
  • use func to check can_generate by @xin3he in #22643
  • add GPTNeoXForSequenceClassification by @Asugawara in #22671
  • Model parallelism: Moving labels to same devices as the logits are by @shahad-mahmud in #22691
  • Update some MarkupLM tests' expected values by @ydshieh in #22667
  • Make it easier to develop without a dev install by @sgugger in #22697
  • Enable naive Pipeline Parallelism training for Gpt neox japanese and san japanese by @mayankagarwals in #22702
  • Clarify stride option by @luccailliau in #22684
  • Remove 2 failing ONNX conversion tests by @ydshieh in #22660
  • Replace -100s in predictions by the pad token by @sgugger in #22693
  • Fix decorator order by @ydshieh in #22708
  • Update input values for docstring by @amyeroberts in #22631
  • remove wrong doc in readme by @ArthurZucker in #22723
  • Added parallel device usage for GPT-J by @jprivera44 in #22713
  • add model resources for CPMAnt (new) by @pioliverse in #20906
  • Modify pipeline_tutorial.mdx by @ARKA1112 in #22726
  • [tests] switch to torchrun by @stas00 in #22712
  • torch.distributed group initialization for torch_neuron disabled when optimum-neuron is installed by @michaelbenayoun in #22728
  • add fast support and option by @ArthurZucker in #22724
  • Update warning levels by @NielsRogge in #22727
  • Fix docstrings for TF BLIP by @Rocketknight1 in #22618
  • [Doctest] Add configuration_m2m_100.py by @elabongaatuo in #22733
  • [Doctest] Add configuration_mvp.py by @elabongaatuo in #22735
  • Indexing fix for gpt_bigcode by @jlamypoirier in #22737
  • Make vilt, switch_transformers compatible with model parallelism by @Xrenya in #22703
  • [Pix2struct] Simplify generation by @NielsRogge in #22527

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @zphang
    • LLaMA Implementation (#21955)
  • @Seb0
    • fix(docs): fix task guide links in model docs (#22226)
  • @mnaylor5
    • Add Mega: Moving Average Equipped Gated Attention (#21766)
  • @Shubhamai
  • @wonhyeongseo
    • docs: ko: complete _toctree.yml (#22581)
    • 🌐 [i18n-KO] Translated pipeline_tutorial.mdx to Korean (#22508)
  • @jlamypoirier
    • Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575)
    • Indexing fix for gpt_bigcode (#22737)
  • @pioliverse
    • add model resources for CPMAnt (new) (#20906)

Don't miss a new transformers release

NewReleases is sending notifications on new releases.