github huggingface/transformers v4.29.0
v4.29.0: Transformers Agents, SAM, RWKV, FocalNet, OpenLLaMa

latest releases: v4.45.1, v4.45.0, v4.44.2...
16 months ago

Transformers Agents

Transformers Agent is a new API that lets you use the library and Diffusers by prompting an agent (which is a large language model) in natural language. That agent will then output code using a set of predefined tools, leveraging the appropriate (and state-of-the-art) models for the task the user wants to perform. It is fully multimodal and extensible by the community. Learn more in the docs

SAM

SAM (Segment Anything Model) was proposed in Segment Anything by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.

The model can be used to predict segmentation masks of any object of interest given an input image.

RWKV

RWKV suggests a tweak in the traditional Transformer attention to make it linear. This way, the model can be used as recurrent network: passing inputs for timestamp 0 and timestamp 1 together is the same as passing inputs at timestamp 0, then inputs at timestamp 1 along with the state of timestamp 0 (see example below).

This can be more efficient than a regular Transformer and can deal with sentence of any length (even if the model uses a fixed context length for training).

FocalNet

The FocalNet model was proposed in Focal Modulation Networks by Jianwei Yang, Chunyuan Li, Xiyang Dai, Lu Yuan, Jianfeng Gao. FocalNets completely replace self-attention (used in models like ViT and Swin) by a focal modulation mechanism for modeling token interactions in vision. The authors claim that FocalNets outperform self-attention based models with similar computational costs on the tasks of image classification, object detection, and segmentation.

OpenLLaMa

The Open-Llama model was proposed in Open-Llama project by community developer s-JoL.

The model is mainly based on LLaMA with some modifications, incorporating memory-efficient attention from Xformers, stable embedding from Bloom, and shared input-output embedding from PLAM. And the model is pre-trained on both Chinese and English, which gives it better performance on Chinese language tasks.

Assisted Generation

Assisted generation is a new technique that lets you speed up generation with large language models by using a smaller model as assistant. The assistant model will be the ones doing multiple forward pass while the LLM will merely validate the tokens proposed by the assistant. This can lead to speed-ups up to 10x!

  • Generate: Add assisted generation by @gante in #22211
  • Generate: assisted generation with sample (take 2) by @gante in #22949

Code on the Hub from another repo

To avoid duplicating the model code in multiple repos when using the code on the Hub feature, loading such models will now save in their config the repo in which the code is. This way there is one source of ground truth for code on the Hub models.

Breaking changes

This releases has three breaking changes compared to version v4.28.0.

The first one focuses on fixing training issues for Pix2Struct. This slightly affects the results, but should result in the model training much better.

  • 🚨🚨🚨 [Pix2Struct] Attempts to fix training issues 🚨🚨🚨 by @younesbelkada in #23004

The second one is aligning the ignore index in the LUKE model to other models in the library. This breaks the convention that models should stick to their original implementation, but it was necessary in order to align with other transformers in the library

Finally, the third breaking change aims to harmonize the training procedure for most of recent additions in transformers. It should be users' responsibility to fill_mask the padding tokens of the labels with the correct value. This PR addresses the issue that was raised by other architectures such as Luke or Pix2Struct

Bugfixes and improvements

  • Change torch_dtype to str when saved_model=True in save_pretrained for TF models by @ydshieh in #22740
  • 🌐 [i18n-KO] Translated training.mdx to Korean by @gabrielwithappy in #22670
  • Remove DS_BUILD_AIO=1 by @ydshieh in #22741
  • [trainer] update url by @stas00 in #22747
  • fix(llama): fix LlamaTokenzier by @rockmagma02 in #22746
  • Generate: handle text conditioning with multimodal encoder-decoder models by @gante in #22748
  • Revert (for now) the change on Deta in #22437 by @ydshieh in #22750
  • Fix serving_output for TF composite models (encoder-decoder like models) by @ydshieh in #22743
  • 🌐 [i18n-KO] Translated sequence_classification.mdx to Korean by @0525hhgus in #22655
  • [Examples] TPU-based training of a language model using TensorFlow by @sayakpaul in #21657
  • Pix2struct: doctest fix by @gante in #22761
  • Generate: pin number of beams in BART test by @gante in #22763
  • Fix a mistake in Llama weight converter log output. by @aljungberg in #22764
  • Fix failing torchscript tests for CpmAnt model by @ydshieh in #22766
  • [WIP]🌐 [i18n-KO] Translated tutorial/proprecssing.mdx to Korean by @sim-so in #22578
  • Tweak ESM tokenizer for Nucleotide Transformer by @Rocketknight1 in #22770
  • Fix word_ids hyperlink by @mayankagarwals in #22765
  • Seq2SeqTrainer: Evict decoder_input_ids only when it is created from labels by @gante in #22772
  • Indexing fix - CLIP checkpoint conversion by @amyeroberts in #22776
  • Move labels to the same device as logits for Whisper by @oscar-garzon in #22779
  • Generate: add CJK support to TextStreamer by @bcol23 in #22664
  • Fix test_word_time_stamp_integration for Wav2Vec2ProcessorWithLMTest by @ydshieh in #22800
  • 🌐 [i18n-KO] Translated custom_models.mdx to Korean by @HanNayeoniee in #22534
  • [i18n-KO] fix: docs: ko: sagemaker anchors and _toctree.yml by @jungnerd in #22549
  • improve(llama): Faster apply_rotary_pos_emb by @fpgaminer in #22785
  • Fix sneaky torch dependency in TF example by @Rocketknight1 in #22804
  • 🌐 [i18n-KO] Translated tasks/translation.mdx to Korean by @wonhyeongseo in #22805
  • Don't use LayoutLMv2 and LayoutLMv3 in some pipeline tests by @ydshieh in #22774
  • Fix squeeze into torch 1.x compatible form in llama model by @DyeKuu in #22808
  • Remove accelerate from tf test reqs by @muellerzr in #22777
  • Simplify update metadata job by @sgugger in #22811
  • Revert "Use code on the Hub from another repo" by @sgugger in #22813
  • Introduce PartialState as the device handler in the Trainer by @muellerzr in #22752
  • Mark auto models as important by @sgugger in #22815
  • TTS fine-tuning for SpeechT5 by @hollance in #21824
  • 🌐 [i18n-KO] Fix anchor links for docs auto_tutorial, training by @gabrielwithappy in #22796
  • Fix Past CI not running against the latest main by @ydshieh in #22823
  • Fix test_eos_token_id_int_and_list_top_k_top_sampling by @ydshieh in #22826
  • Update accelerate version + warning check fix by @muellerzr in #22833
  • Fix from_pretrained when model is instantiated on the meta device by @sgugger in #22837
  • Raise err if minimum Accelerate version isn't available by @muellerzr in #22841
  • Make ClipSeg compatible with model parallelism by @youssefadr in #22844
  • fix SpeechT5 doc comments by @hollance in #22854
  • move preprocess_logits_for_metrics before _nested_gather in trainer.e… by @ChenyangLiu in #22603
  • feat(model parallelism): move labels to the same device as logits for M2M100 by @elabongaatuo in #22850
  • use accelerate@main in CI by @ydshieh in #22859
  • Remove 'main' from doc links by @amyeroberts in #22860
  • Show diff between 2 CI runs on Slack reports by @ydshieh in #22798
  • Remove some pipeline skip cases by @ydshieh in #22865
  • Fixup multigpu local_rank by @muellerzr in #22869
  • Fix to removing ESM special tokens by @Rocketknight1 in #22870
  • XGLM: Fix left-padding (PT and TF) by @gante in #22828
  • Patching clip model to create mask tensor on the device by @shanmugamr1992 in #22711
  • fix: Correct small typo in docstring by @oscar-defelice in #22857
  • Generation: only search for eos_token if set by @xloem in #22875
  • Change schedule CI time by @ydshieh in #22884
  • fix warning function call creating logger error (max_length and max_new_tokens) by @QuentinAmbard in #22889
  • [Examples/TensorFlow] minor refactoring to allow compatible datasets to work by @sayakpaul in #22879
  • moved labels to the same device as logits for OTP, CODEGEN ,gptj and pixel2struct model by @sushmanthreddy in #22872
  • Include decoder_attention_mask in T5 model inputs by @aashiqmuhamed in #22835
  • Fix weight tying in TF-ESM by @Rocketknight1 in #22839
  • Pin flax & optax version by @amyeroberts in #22895
  • Revert DeepSpeed stuff from accelerate integration by @muellerzr in #22899
  • [tensorflow] Add support for the is_symbolic_tensor predicate by @hvaara in #22878
  • moved labels to the same device as logits for LILT model by @sushmanthreddy in #22898
  • Skip a failing test on main for now by @ydshieh in #22911
  • Moved labels to enable parallelism pipeline in Luke model by @sushmanthreddy in #22909
  • Fix counting in Slack report for some jobs by @ydshieh in #22913
  • Fix Slack report for Nightly CI and Past CI by @ydshieh in #22901
  • fix CLAP integration tests by @hollance in #22834
  • Add inputs_embeds functionality when generating with GPT-Neox by @TobiasLee in #22916
  • Fix FillMaskPipelineTests by @ydshieh in #22894
  • Update Swin MIM output class by @alaradirik in #22893
  • fix bug of CLAP dataloader by @lukewys in #22674
  • Fix: Seq2SeqTrainingArgs overriding to_dict for GenerationConfig json support by @Natooz in #22919
  • fix: GPTNeoX half inference error by @SeongBeomLEE in #22888
  • Remove broken test_data symlink in legacy s2s examples by @hvaara in #22876
  • Hardcode GELU as the intermediate activation for ESM by @Rocketknight1 in #22892
  • [CI] clap patch fusion test values by @ArthurZucker in #22922
  • ddp fixes for training by @winglian in #22874
  • tests: Fix flaky test for NLLB-MoE by @connor-henderson in #22880
  • Fix a minor bug in CI slack report by @ydshieh in #22906
  • Feature to convert videomae huge and small finetuned on kinetics and ssv2 added to the videomae to pytorch converter by @sandstorm12 in #22788
  • vilt_model by @sushmanthreddy in #22930
  • [i18n-KO] Translated accelerate.mdx to Korean by @0525hhgus in #22830
  • [CLAP] Doc nits by @ArthurZucker in #22957
  • Generate: Add exception path for Donut by @gante in #22955
  • Update tiny models and a few fixes by @ydshieh in #22928
  • 🌐 [i18n-KO] Translated tasks/masked_language_modeling.mdx to Korean by @HanNayeoniee in #22838
  • 🌐 [i18n-KO] Translated tasks/summarization.mdx to Korean by @sim-so in #22783
  • Add an attribute to disable custom kernels in deformable detr in order to make the model ONNX exportable by @fxmarty in #22918
  • Decorate test_codegen_sample_max_time as flaky by @ydshieh in #22953
  • Raise error if stride is too high in TokenClassificationPipeline by @boyleconnor in #22942
  • [Fix Bugs] Fix keys in _load_pretrained_model by @hanrui1sensetime in #22947
  • Prepare tests for hfh 0.14 by @Wauplin in #22958
  • 🌐 [i18n-KO] Translated run_scripts.mdx to Korean by @HanNayeoniee in #22793
  • Reverting Deta cloning mecanism. by @Narsil in #22656
  • fix ValueError message in LlamaAttention by @othertea in #22966
  • Fix TF example in quicktour by @Rocketknight1 in #22960
  • Update feature selection in to_tf_dataset by @amyeroberts in #21935
  • 🌐 [i18n-KO] translate create_a_model doc to Korean by @gabrielwithappy in #22754
  • Install accelerete@main in PyTorch Past CI jobs by @ydshieh in #22963
  • Fix DeepSpeed CI job link in Past CI by @ydshieh in #22967
  • 🌐 [i18n-KO] Fixed tasks/masked_language_modeling.mdx by @HanNayeoniee in #22965
  • Neptune fix bug init run by @AleksanderWWW in #22836
  • fixed small typo in code example by @jvanmelckebeke in #22982
  • Avoid invalid escape sequences, use raw strings by @Lingepumpe in #22936
  • [DocTest] Fix correct checkpoint by @younesbelkada in #22988
  • 🌐 [i18n-KO] Translated serialization.mdx to Korean by @wonhyeongseo in #22806
  • Fix typo in mega.mdx by @dleve123 in #22998
  • 🌐 [i18n-KO] Translated tasks/image_captioning.mdx to Korean by @sim-so in #22943
  • 🌐 [i18n-KO] Translated token_classification.mdx to Korean by @0525hhgus in #22945
  • Add TensorFlow Wav2Vec2 for sequence classification by @nandwalritik in #22073
  • Remove a failing ONNX test by @ydshieh in #23011
  • Add gradient checkpointing to Whisper Flax by @versae in #22954
  • [PEFT] Add HFTracer support for PEFT by @younesbelkada in #23006
  • [Llama Tokenizer] Fast llama template by @ArthurZucker in #22959
  • Fix None value when adding info to auto_map by @sgugger in #22990
  • Bring back PartialState DeepSpeed by @muellerzr in #22921
  • Add methods to PreTrainedModel to use PyTorch's BetterTransformer by @fxmarty in #21259
  • [Pix2Struct] Fix pix2struct doctest by @younesbelkada in #23023
  • 🌐 [i18n-KO] Translated multilingual.mdx to Korean by @HanNayeoniee in #23008
  • Fix the expected error in test_offline_mode_pipeline_exception by @ydshieh in #23022
  • [MEGA] nit size test by @ArthurZucker in #23028
  • added GPTNeoXForTokenClassification by @peter-sk in #23002
  • added GPTNeoForTokenClassification by @peter-sk in #22908
  • Update BridgeTowerModelTester by @ydshieh in #23029
  • Fix bigbird random attention by @Bearnardd in #21023
  • Fix CLAP link across all READMEs by @ehsanmok in #23032
  • Make _test_xla_generate less flaky by @ydshieh in #22996
  • Add Trainer support for ReduceLROnPlateau by @pie3636 in #23010
  • 🌐 [i18n-KO] Translated model_sharing.mdx to Korean by @0525hhgus in #22991
  • [docs] Doc TOC updates by @MKhalusova in #23049
  • Cuda rng_state_all is used when saving in distributed mode so same should also be used when loading by @ShivamShrirao in #23045
  • Skip pt/flax equivalence tests in pytorch bigbird test file by @ydshieh in #23040
  • Fix model parallelism for BridgeTower by @ydshieh in #23039
  • extend the test files by @ydshieh in #23043
  • Generate: prepare assisted generation for release by @gante in #23052
  • Fix grammar error in summarization pipeline by @SKaplanOfficial in #23080
  • Fix string syntax error in logger warning message (additional comma) by @xwen99 in #23083
  • Add BioGPTForSequenceClassification by @awinml in #22253
  • Fix convnext init by @IMvision12 in #23078
  • Depricate xpu_backend for ddp_backend by @muellerzr in #23085
  • 🌐 [i18n-KO] Translated tasks/image_classification.mdx to Korean by @0525hhgus in #23048
  • 🌐 [i18n-KO] Translated tasks/question_answering.mdx to Korean by @jungnerd in #23012
  • 🌐 [i18n-KO] Translated tasks/zero_shot_image_classification.mdx to Korean by @HanNayeoniee in #23065
  • added type hints for blip_text pytorch model by @iamarunbrahma in #23071
  • Save the tokenizer and image preprocessor after training a model with the contrastive image-text example by @regisss in #23035
  • GPT2ForQuestionAnswering by @peter-sk in #23030
  • 🌐 [i18n-KO] Translated torchscript.mdx to Korean by @sim-so in #23060
  • Fix check for backword_pos by @winglian in #23075
  • [Flava] Fix flava torch.distributed.nn.functional import all_gather issue by @younesbelkada in #23108
  • [ONNX] Sam fix by @michaelbenayoun in #23110
  • num_noise_spans should be <= num_items #22246 by @alexcpn in #22938
  • Fixed default config for Pix2Struct model to set Pix2StructTextModel to is_decoder=True by @gbarello-uipath in #23051
  • Pin numba for now by @sgugger in #23118
  • [Doctest] Fix pix2struct doctest by @younesbelkada in #23121
  • Generate: slow assisted generation test by @gante in #23125
  • Generate: correct beam search length on score calculation for multi batch generation by @gante in #23127
  • improve unclear documentation by @ManuelFay in #23123
  • Generate: better warnings with pipelines by @gante in #23128
  • Add resources for LayoutLmV2 and reformat documentation resources by @y3sar in #23115
  • Fix ConvNext V2 paramater naming issue by @alaradirik in #23122
  • Support union types X | Y syntax for HfArgumentParser for Python 3.10+ by @XuehaiPan in #23126
  • Add support for beam search's num_return_sequencs flag in flax by @mayankagarwals in #23082
  • docs: ko: update _toctree.yml by @HanNayeoniee in #23112
  • [doc] Try a few ≠ ways of linking to Papers, users, and org profiles by @julien-c in #22611
  • Enable to use custom tracer in FX symbolic_trace by @regisss in #23105
  • Remove redundant print statements by @alaradirik in #23133
  • Tidy Pytorch GLUE benchmark example by @tlby in #23134
  • GPTNeoForQuestionAnswering by @peter-sk in #23057
  • Add methods to update and verify out_features out_indices by @amyeroberts in #23031
  • fix spelling error by @digger-yu in #23143
  • Remove typo in perf_train_gpu_many.mdx by @MrGeislinger in #23144
  • fix resume fsdp by @qywu in #23111
  • gpt2 multi-gpu fix by @peter-sk in #23149
  • GPTNeoXForQuestionAnswering by @peter-sk in #23059
  • [GPT-J] Fix causal mask dtype by @younesbelkada in #23147
  • Add FlaxWhisperForAudioClassification model by @raghavanone in #22883
  • [docs] Text to speech task guide by @MKhalusova in #23107
  • Generate: text generation pipeline no longer emits max_length warning when it is not set by @gante in #23139
  • Revert "Add FlaxWhisperForAudioClassification model" by @sgugger in #23154
  • Add TrOCR resources by @huangperry in #23142
  • fixed whisper positional encoding by @anvilarth in #23167
  • 🌐 [i18n-KO] docs: ko: Translate multiple_choice.mdx by @gabrielwithappy in #23064
  • fix: Passing language as acronym to Whisper generate by @connor-henderson in #23141
  • Add no_trainer scripts to pre-train Vision Transformers by @awinml in #23156
  • Add FlaxWhisperForAudioClassification model by @raghavanone in #23173
  • search buffers for dtype by @cyyever in #23159
  • Update LLaMA docs with arxiv link by @awinml in #23191
  • fix random attention for pytorch's bigbird/pegasus_bigbird by @Bearnardd in #23056
  • Fix hf_argparser.parse_json_file to open file with utf-8 encoding, close file when finished by @RobertBaruch in #23194
  • Generate: starcoder 🤜 🤛 assisted generation by @gante in #23182
  • Fixing class embedding selection in owl-vit by @orrzohar in #23157
  • New version of Accelerate for the Trainer by @sgugger in #23204
  • docs: Fix broken link in 'How to add a model...' by @connor-henderson in #23216
  • Pin tensorflow-probability by @sgugger in #23220
  • [SAM] Add resources by @NielsRogge in #23224
  • audio_utils improvements by @hollance in #21998
  • make opt checkpoint dir name correct by @dumpmemory in #21660
  • Fix typo ; Update output.mdx by @furkanakkurt1335 in #23227
  • fix: Update run_qa.py to work with deepset/germanquad by @sjrl in #23225
  • Add Japanese translation to accelerate.mdx by @rustinwelter in #23232
  • Proposed fix for TF example now running on safetensors. by @Narsil in #23208
  • Support ratios for logging_steps, eval_steps, and save_steps by @konstantinjdobler in #23235
  • [Doctests] Refactor doctests + add CI by @ArthurZucker in #22987
  • Revert "[Doctests] Refactor doctests + add CI" by @sgugger in #23245
  • Fix from_config by @DyeKuu in #23246
  • CTC example: updated trainer parameters to save tokenizer by @MKhalusova in #23243
  • [docs] Audio task guides fixes by @MKhalusova in #23239
  • Improve Docs of Custom Tools and Agents by @patrickvonplaten in #23255
  • Metadata update by @LysandreJik in #23259
  • Update Image segmentation description by @LysandreJik in #23261
  • pin tensorflow-probability in docker files by @ydshieh in #23260
  • Refine documentation for Tools by @sgugger in #23266
  • Fix new line bug in chat mode for agents by @sgugger in #23267
  • Render custom tool docs a bit better by @sgugger in #23269
  • chore: allow protobuf 3.20.3 requirement by @jose-turintech in #22759
  • Fix link displayed for custom tools by @sgugger in #23274
  • Remove missplaced test file by @sgugger in #23275

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @gabrielwithappy
    • 🌐 [i18n-KO] Translated training.mdx to Korean (#22670)
    • 🌐 [i18n-KO] Fix anchor links for docs auto_tutorial, training (#22796)
    • 🌐 [i18n-KO] translate create_a_model doc to Korean (#22754)
    • 🌐 [i18n-KO] docs: ko: Translate multiple_choice.mdx (#23064)
  • @0525hhgus
    • 🌐 [i18n-KO] Translated sequence_classification.mdx to Korean (#22655)
    • [i18n-KO] Translated accelerate.mdx to Korean (#22830)
    • 🌐 [i18n-KO] Translated token_classification.mdx to Korean (#22945)
    • 🌐 [i18n-KO] Translated model_sharing.mdx to Korean (#22991)
    • 🌐 [i18n-KO] Translated tasks/image_classification.mdx to Korean (#23048)
  • @sim-so
    • [WIP]🌐 [i18n-KO] Translated tutorial/proprecssing.mdx to Korean (#22578)
    • 🌐 [i18n-KO] Translated tasks/summarization.mdx to Korean (#22783)
    • 🌐 [i18n-KO] Translated tasks/image_captioning.mdx to Korean (#22943)
    • 🌐 [i18n-KO] Translated torchscript.mdx to Korean (#23060)
  • @HanNayeoniee
    • 🌐 [i18n-KO] Translated custom_models.mdx to Korean (#22534)
    • 🌐 [i18n-KO] Translated tasks/masked_language_modeling.mdx to Korean (#22838)
    • 🌐 [i18n-KO] Translated run_scripts.mdx to Korean (#22793)
    • 🌐 [i18n-KO] Fixed tasks/masked_language_modeling.mdx (#22965)
    • 🌐 [i18n-KO] Translated multilingual.mdx to Korean (#23008)
    • 🌐 [i18n-KO] Translated tasks/zero_shot_image_classification.mdx to Korean (#23065)
    • docs: ko: update _toctree.yml (#23112)
  • @wonhyeongseo
    • 🌐 [i18n-KO] Translated tasks/translation.mdx to Korean (#22805)
    • 🌐 [i18n-KO] Translated serialization.mdx to Korean (#22806)
  • @peter-sk
    • added GPTNeoXForTokenClassification (#23002)
    • added GPTNeoForTokenClassification (#22908)
    • GPT2ForQuestionAnswering (#23030)
    • GPTNeoForQuestionAnswering (#23057)
    • gpt2 multi-gpu fix (#23149)
    • GPTNeoXForQuestionAnswering (#23059)
  • @s-JoL
    • add open-llama model with ckpt (#22795)
  • @awinml
    • Add BioGPTForSequenceClassification (#22253)
    • Add no_trainer scripts to pre-train Vision Transformers (#23156)
    • Update LLaMA docs with arxiv link (#23191)
  • @raghavanone
    • Add FlaxWhisperForAudioClassification model (#22883)
    • Add FlaxWhisperForAudioClassification model (#23173)

Don't miss a new transformers release

NewReleases is sending notifications on new releases.