github huggingface/transformers v4.43.0
v4.43.0: Llama 3.1, Chameleon, ZoeDepth, Hiera

latest releases: v4.44.2, v4.44.1, v4.44.0...
one month ago

Llama

The Llama 3.1 models are released by Meta and come in three flavours: 8B, 70B, and 405B.

To get an overview of Llama 3.1, please visit the Hugging Face announcement blog post.

We release a repository of llama recipes to showcase usage for inference, total and partial fine-tuning of the different variants.

image

Chameleon

The Chameleon model was proposed in Chameleon: Mixed-Modal Early-Fusion Foundation Models by META AI Chameleon Team. Chameleon is a Vision-Language Model that use vector quantization to tokenize images which enables the model to generate multimodal output. The model takes images and texts as input, including an interleaved format, and generates textual response.

ZoeDepth

The ZoeDepth model was proposed in ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth by Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, Matthias Müller. ZoeDepth extends the DPT framework for metric (also called absolute) depth estimation. ZoeDepth is pre-trained on 12 datasets using relative depth and fine-tuned on two domains (NYU and KITTI) using metric depth. A lightweight head is used with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier.

Hiera

Hiera was proposed in Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles by Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer

The paper introduces “Hiera,” a hierarchical Vision Transformer that simplifies the architecture of modern hierarchical vision transformers by removing unnecessary components without compromising on accuracy or efficiency. Unlike traditional transformers that add complex vision-specific components to improve supervised classification performance, Hiera demonstrates that such additions, often termed “bells-and-whistles,” are not essential for high accuracy. By leveraging a strong visual pretext task (MAE) for pretraining, Hiera retains simplicity and achieves superior accuracy and speed both in inference and training across various image and video recognition tasks. The approach suggests that spatial biases required for vision tasks can be effectively learned through proper pretraining, eliminating the need for added architectural complexity.

Agents

Our ReactAgent has a specific way to return its final output: it calls the tool final_answer, added to the user-defined toolbox upon agent initialization, with the answer as the tool argument. We found that even for a one-shot agent like CodeAgent, using a specific final_answer tools helps the llm_engine find what to return: so we generalized the final_answer tool for all agents.

Now if your code-based agent (like ReactCodeAgent) defines a function at step 1, it will remember the function definition indefinitely. This means your agent can create its own tools for later re-use!

This is a transformative PR: it allows the agent to regularly run a specific step for planning its actions in advance. This gets activated if you set an int for planning_interval upon agent initialization. At step 0, a first plan will be done. At later steps (like steps 3, 6, 9 if you set planning_interval=3 ), this plan will be updated by the agent depending on the history of previous steps. More detail soon!

Notable changes to the codebase

A significant RoPE refactor was done to make it model agnostic and more easily adaptable to any architecture.
It is only applied to Llama for now but will be applied to all models using RoPE over the coming days.

Breaking changes

TextGenerationPipeline and tokenizer kwargs

🚨🚨 This PR changes the code to rely on the tokenizer's defaults when these flags are unset. This means some models using TextGenerationPipeline previously did not add a <bos> by default, which (negatively) impacted their performance. In practice, this is a breaking change.

Example of a script changed as a result of this PR:

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2-9b-it", torch_dtype=torch.bfloat16, device_map="auto")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe("Foo bar"))
  • 🚨🚨 TextGenerationPipeline: rely on the tokenizer default kwargs by @gante in #31747

Bugfixes and improvements

  • Fix post gemma merge by @ArthurZucker in #31660
  • Fix float out of range in owlvit and owlv2 when using FP16 or lower precision by @aliencaocao in #31657
  • [docs] Llama3 by @stevhliu in #31662
  • [HybridCache] Fix get_seq_length method by @sanchit-gandhi in #31661
  • don't zero out the attention_mask when using sliding window with flash attention by @winglian in #31670
  • Fix Gemma2 4d attention mask by @hiyouga in #31674
  • Fix return_dict in encodec by @jla524 in #31646
  • add gather_use_object arguments by @SangbumChoi in #31514
  • Gemma capping is a must for big models by @ArthurZucker in #31698
  • Add French version of run scripts tutorial by @jadechoghari in #31483
  • dependencies: keras-nlp<0.14 pin by @gante in #31684
  • remove incorrect urls pointing to the llava repository by @BiliBraker in #31107
  • Move some test files (tets/test_xxx_utils.py) to tests/utils by @ydshieh in #31730
  • Fix mistral ONNX export by @fxmarty in #31696
  • [whisper] static kv cache by @sanchit-gandhi in #31166
  • Make tool JSON schemas consistent by @Rocketknight1 in #31756
  • Fix documentation for Gemma2. by @jbornschein in #31682
  • fix assisted decoding by @jiqing-feng in #31401
  • Requires for torch.tensor before casting by @echarlaix in #31755
  • handle (processor_class, None) returned by ModelPatterns by @molbap in #31753
  • Gemma 2: Update slow tests by @gante in #31759
  • Add ignore_errors=True to trainer.py rmtree in _inner_training_loop by @njbrake in #31668
  • [fix bug] logits's shape different from label's shape in preprocess_logits_for_metrics by @wiserxin in #31447
  • Fix RT-DETR cache for generate_anchors by @qubvel in #31671
  • Fix RT-DETR weights initialization by @qubvel in #31724
  • pytest_num_workers=4 for some CircleCI jobs by @ydshieh in #31764
  • Fix Gemma2 types by @hiyouga in #31779
  • Add torch_empty_cache_steps to TrainingArguments by @aliencaocao in #31546
  • Fix ClapProcessor to merge feature_extractor output into the returned BatchEncoding by @mxkopy in #31767
  • Fix serialization for offloaded model by @SunMarc in #31727
  • Make tensor device correct when ACCELERATE_TORCH_DEVICE is defined by @kiszk in #31751
  • Exclude torch.compile time from metrics computation by @zxd1997066 in #31443
  • Update CometCallback to allow reusing of the running experiment by @Lothiraldan in #31366
  • Fix gemma tests by @ydshieh in #31794
  • Add training support for SigLIP by @aliencaocao in #31495
  • Repeating an important warning in the chat template docs by @Rocketknight1 in #31796
  • Allow FP16 or other precision inference for Pipelines by @aliencaocao in #31342
  • Fix galore lr display with schedulers by @vasqu in #31710
  • Fix Wav2Vec2 Fairseq conversion (weight norm state dict keys) by @gau-nernst in #31714
  • Depth Anything: update conversion script for V2 by @pcuenca in #31522
  • Fix Seq2SeqTrainer crash when BatchEncoding data is None by @iohub in #31418
  • Bump certifi from 2023.7.22 to 2024.7.4 in /examples/research_projects/decision_transformer by @dependabot[bot] in #31813
  • Add FA2 and sdpa support for SigLIP by @qubvel in #31499
  • Bump transformers from 4.26.1 to 4.38.0 in /examples/tensorflow/language-modeling-tpu by @dependabot[bot] in #31837
  • Bump certifi from 2023.7.22 to 2024.7.4 in /examples/research_projects/lxmert by @dependabot[bot] in #31838
  • Fix typos by @omahs in #31819
  • transformers.fx.symbolic_trace supports inputs_embeds by @fxmarty in #31574
  • Avoid failure TFBlipModelTest::test_pipeline_image_to_text by @ydshieh in #31827
  • Fix incorrect accelerator device handling for MPS in TrainingArguments by @andstor in #31812
  • Mamba & RecurrentGemma: enable strict signature by @gante in #31549
  • Deprecate vocab_size in other two VLMs by @zucchini-nlp in #31681
  • FX symbolic_trace: do not test decoder_inputs_embeds by @fxmarty in #31840
  • [Grounding DINO] Add processor to auto mapping by @NielsRogge in #31845
  • chore: remove duplicate words by @hattizai in #31853
  • save_pretrained: use tqdm when saving checkpoint shards from offloaded params by @kallewoof in #31856
  • Test loading generation config with safetensor weights by @gante in #31550
  • docs: typo in tf qa example by @chen-keinan in #31864
  • Generate: Add new decoding strategy "DoLa" in .generate() by @voidism in #29619
  • Fix _init_weights for ResNetPreTrainedModel by @ydshieh in #31851
  • Update depth estimation task guide by @merveenoyan in #31860
  • Bump zipp from 3.7.0 to 3.19.1 in /examples/research_projects/decision_transformer by @dependabot[bot] in #31871
  • Add return type annotation to PreTrainedModel.from_pretrained by @mauvilsa in #31869
  • Revert "Fix _init_weights for ResNetPreTrainedModel" by @ydshieh in #31868
  • Bump certifi from 2023.7.22 to 2024.7.4 in /examples/research_projects/visual_bert by @dependabot[bot] in #31872
  • add warning when using gradient_checkpointing with FSDP full shard by @yundai424 in #31578
  • Add conversion for interleave llava by @zucchini-nlp in #31858
  • remove duplicate words in msg by @yukionfire in #31876
  • Fix file type checks in data splits for contrastive training example script by @npyoung in #31720
  • Fix failed tests in #31851 by @ydshieh in #31879
  • fix: Removed duplicate field definitions in some classes by @Sai-Suraj-27 in #31888
  • Push sharded checkpoint to hub when push_to_hub=True in TrainingArguments by @SunMarc in #31808
  • [RT-DETR] Add resources by @NielsRogge in #31815
  • Modify warnings in a with block to avoid flaky tests by @ydshieh in #31893
  • Add a condition for nested_detach by @haikuoxin in #31855
  • InstructBlipVideo: Update docstring by @zucchini-nlp in #31886
  • Fixes to alternating SWA layers in Gemma2 by @turboderp in #31775
  • Processor accepts any kwargs by @zucchini-nlp in #31889
  • [ConvertSlow] make sure the order is preserved for addedtokens by @ArthurZucker in #31902
  • [Gemma2] Support FA2 softcapping by @ArthurZucker in #31887
  • Fix missing methods for Fuyu by @Isotr0py in #31880
  • fix: Fixed the 1st argument name in classmethods by @Sai-Suraj-27 in #31907
  • add gather_use_object arguments II by @SangbumChoi in #31799
  • Add warning message for beta and gamma parameters by @OmarManzoor in #31654
  • Fix fx tests with inputs_embeds by @fxmarty in #31862
  • Refactor flash attention implementation in transformers by @ArthurZucker in #31446
  • Generate: fix SlidingWindowCache.reset() by @gante in #31917
  • 🚨 fix(SigLip): remove spurious exclusion of first vision output token by @transmissions11 in #30952
  • Allow Trainer.get_optimizer_cls_and_kwargs to be overridden by @apoorvkh in #31875
  • [Bug Fix] fix qa pipeline tensor to numpy by @jiqing-feng in #31585
  • Docker: TF pin on the consistency job by @gante in #31928
  • fix prompt strip to support tensors and np arrays by @AvivSham in #27818
  • Fix GenerationMixin.generate compatibility with pytorch profiler by @fxmarty in #31935
  • Generate: remove deprecated code due to Cache and cache_position being default by @gante in #31898
  • Generate: v4.42 deprecations 🧹🧹 by @gante in #31956
  • Whisper: move to tensor cpu before converting to np array at decode time by @gante in #31954
  • fix: Removed a wrong key-word argument in sigmoid_focal_loss() function call by @Sai-Suraj-27 in #31951
  • Generate: handle logits_warper update in models with custom generate fn by @gante in #31957
  • fix: Fixed the arguments in create_repo() function call by @Sai-Suraj-27 in #31947
  • Notify new docker images built for circleci by @ydshieh in #31701
  • Avoid race condition by @ydshieh in #31973
  • Masking: remove flakiness from test by @gante in #31939
  • Generate: doc nits by @gante in #31982
  • Fix the incorrect permutation of gguf by @PenutChen in #31788
  • Cambricon MLUs support SDPA and flash_attn by @huismiling in #31102
  • Speedup model init on CPU (by 10x+ for llama-3-8B as one example) by @muellerzr in #31771
  • [tests] fix deepspeed zero3 config for test_stage3_nvme_offload by @faaany in #31881
  • Fix bad test about slower init by @muellerzr in #32002
  • Tests: remove cuda versions when the result is the same 🧹🧹 by @gante in #31955
  • Bug report update by @gante in #31983
  • add flash-attn deterministic option to flash-attn>=2.4.1 by @junrae6454 in #31961
  • fix: Fixed incorrect dictionary assignment in src/transformers/__init__.py by @Sai-Suraj-27 in #31993
  • Bug report update -- round 2 by @gante in #32006
  • Fix gather when collecting 'num_input_tokens_seen' by @CodeCreator in #31974
  • Fix if else and actually enable superfast init by @muellerzr in #32007
  • SpeechEncoderDecoder doesn't support param buffer assignments by @muellerzr in #32009
  • Fix tests skip by @qubvel in #32012
  • Fixed log messages that are resulting in TypeError due to too many arguments by @Sai-Suraj-27 in #32017
  • Fix typo in classification function selection logic to improve code consistency by @moses in #32031
  • doc: fix broken BEiT and DiNAT model links on Backbone page by @dvrogozh in #32029
  • Pass missing arguments to SeamlessM4Tv2ConformerEncoderLayer.forward() when gradient checkpointing is enabled by @anferico in #31945
  • Add language to word timestamps for Whisper by @robinderat in #31572
  • Add sdpa and FA2 for CLIP by @qubvel in #31940
  • unpin numpy<2.0 by @ydshieh in #32018
  • Chameleon: minor fixes after shipping by @zucchini-nlp in #32037
  • Bump scikit-learn from 1.0.2 to 1.5.0 in /examples/research_projects/decision_transformer by @dependabot[bot] in #31458
  • Bump scikit-learn from 1.1.2 to 1.5.0 in /examples/research_projects/codeparrot/examples by @dependabot[bot] in #32052
  • [mistral] Support passing head_dim through config (and do not require head_dim * num_heads == hidden_size) by @xenova in #32050
  • Add torch.compile Support For Mamba by @zhenglongjiepheonix in #31247
  • fix: Removed duplicate entries in a dictionary by @Sai-Suraj-27 in #32041
  • docs: Fixed 2 links in the docs along with some minor fixes by @Sai-Suraj-27 in #32058
  • Llava: add default chat templates by @zucchini-nlp in #31691
  • [Chameleon, Hiera] Improve docs by @NielsRogge in #32038
  • Incorrect Whisper long-form decoding timestamps by @kamilakesbi in #32003
  • [mistral] Fix FA2 attention reshape for Mistral Nemo by @xenova in #32065
  • VideoLLaVa: fix chat format in docs by @zucchini-nlp in #32083
  • Fix progress callback deepcopy by @fozziethebeat in #32070
  • Fixes to chameleon docs by @merveenoyan in #32078
  • Add image-text-to-text task guide by @merveenoyan in #31777
  • Support generating with fallback for short form audio in Whisper by @kamilakesbi in #30984
  • Disable quick init for deepspeed by @muellerzr in #32066
  • Chameleon: not supported with fast load by @zucchini-nlp in #32091
  • Fix tests after huggingface_hub 0.24 by @Wauplin in #32054
  • Fix shard order by @b-chu in #32023
  • Generate: store special token tensors under a unique variable name by @gante in #31980
  • fix: Replaced deprecated mktemp() function by @Sai-Suraj-27 in #32123
  • Mention model_info.id instead of model_info.modelId by @Wauplin in #32106
  • [generate] fix eos/pad id check on mps devices by @sanchit-gandhi in #31695
  • Fix failing test with race condition by @Rocketknight1 in #32140
  • Update ko/_toctree.yml and remove custom_tools.md to reflect latest changes by @jungnerd in #31969
  • fix: Fixed raising TypeError instead of ValueError for invalid type by @Sai-Suraj-27 in #32111
  • [RoBERTa] Minor clarifications to model doc by @bt2513 in #31949
  • Return assistant generated tokens mask in apply_chat_template by @yonigottesman in #30650
  • Don't default to other weights file when use_safetensors=True by @amyeroberts in #31874
  • set warning level to info for special tokens have been added by @ArthurZucker in #32138
  • Add new quant method by @SunMarc in #32047
  • Add llama3-llava-next-8b to llava_next conversion script by @jamt9000 in #31395
  • LLaVaNeXT: pad on right if training by @zucchini-nlp in #32134
  • Remove trust_remote_code when loading Libri Dummy by @sanchit-gandhi in #31748
  • [modelling] remove un-necessary transpose for fa2 attention by @sanchit-gandhi in #31749
  • Fix mask creations of GPTNeoX and GPT2 by @vasqu in #31944
  • Add method to retrieve used chat template by @KonradSzafer in #32032
  • Add YaRN and Dynamic-YaRN RoPE Scaling Methods by @mig-mfreitas in #30910
  • Disable quick init for TapasPreTrainedModel by @daniellok-db in #32149
  • Modify resize_token_embeddings to ensure output type is same as input by @bayllama in #31979
  • gguf conversion add_prefix_space=None for llama3 by @itazap in #31937
  • Fix flash attention speed issue by @Cyrilvallez in #32028
  • Fix video batching to videollava by @merveenoyan in #32139
  • Added mamba.py backend by @alxndrTL in #30139
  • Rename Phi-3 rope scaling type by @garg-amit in #31436
  • Revert "Incorrect Whisper long-form decoding timestamps " by @sanchit-gandhi in #32148
  • Fix typing to be compatible with later py versions by @amyeroberts in #32155
  • feat(cache): StaticCache uses index_copy_ to avoid useless copy by @tengomucho in #31857
  • Added additional kwarg for successful running of optuna hyperparameter search by @DeF0017 in #31924
  • Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs by @RhuiDih in #31629

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @aliencaocao
    • Fix float out of range in owlvit and owlv2 when using FP16 or lower precision (#31657)
    • Add torch_empty_cache_steps to TrainingArguments (#31546)
    • Add training support for SigLIP (#31495)
    • Allow FP16 or other precision inference for Pipelines (#31342)
  • @voidism
    • Generate: Add new decoding strategy "DoLa" in .generate() (#29619)
  • @Namangarg110

Don't miss a new transformers release

NewReleases is sending notifications on new releases.