pypi transformers 4.31.0
v4.31.0: Llama v2, MusicGen, Bark, MMS, EnCodec, InstructBLIP, Umt5, MRa, vIvIt

latest releases: 4.46.2, 4.46.1, 4.46.0...
16 months ago

New models

Llama v2

Llama 2 was proposed in LLaMA: Open Foundation and Fine-Tuned Chat Models by Hugo Touvron et al. It builds upon the Llama architecture adding Grouped Query Attention for efficient inference.

Musicgen

The MusicGen model was proposed in the paper Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.

MusicGen is a single stage auto-regressive Transformer model capable of generating high-quality music samples conditioned on text descriptions or audio prompts. The text descriptions are passed through a frozen text encoder model to obtain a sequence of hidden-state representations. MusicGen is then trained to predict discrete audio tokens, or audio codes, conditioned on these hidden-states. These audio tokens are then decoded using an audio compression model, such as EnCodec, to recover the audio waveform.

Through an efficient token interleaving pattern, MusicGen does not require a self-supervised semantic representation of the text/audio prompts, thus eliminating the need to cascade multiple models to predict a set of codebooks (e.g. hierarchically or upsampling). Instead, it is able to generate all the codebooks in a single forward pass.

Bark

Bark is a transformer-based text-to-speech model proposed by Suno AI in suno-ai/bark.

MMS

The MMS model was proposed in Scaling Speech Technology to 1,000+ Languages by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli

EnCodec

The EnCodec neural codec model was proposed in High Fidelity Neural Audio Compression by Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi.

InstructBLIP

The InstructBLIP model was proposed in InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the BLIP-2 architecture for visual instruction tuning.

Umt5

The UMT5 model was proposed in UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.

MRA

The MRA model was proposed in Multi Resolution Analysis (MRA) for Approximate Self-Attention by Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, and Vikas Singh.

ViViT

The Vivit model was proposed in ViViT: A Video Vision Transformer by Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid. The paper proposes one of the first successful pure-transformer based set of models for video understanding.

Python 3.7

The last version to support Python 3.7 was 4.30.x, as it reached end-of-life on June 27, 2023 and is no longer supported by the Python Software Foundation.

PyTorch 1.9

The last version to support PyTorch 1.9 was 4.30.x. As it has been more than 2 years, and we're looking forward to using features available in PyTorch 1.10 and up, we do not support PyTorch 1.9 for v4.31 and up.

RoPE scaling

This PR adds RoPE scaling to the LLaMa and GPTNeoX families of models. It allows us to extrapolate and go beyond the original maximum sequence length (e.g. 2048 tokens on LLaMA), without fine-tuning. It offers two strategies:

  • Linear scaling
  • Dynamic NTK scaling

Agents

Tools now return a type that is specific to agents. This type can return a serialized version of itself (a string), that either points to a file on-disk or to the object's content. This should make interaction with text-based systems much simpler.

Tied weights load

Models with potentially tied weights dropped off some keys from the state dict even when the weights were not tied. This has now been fixed and more generally, the whole experience of loading a model with state dict that don't match exactly should be improved in this release.

Whisper word-level timestamps

This PR adds a method of predicting timestamps at the word (or even token) level, by analyzing the cross-attentions and applying dynamic time warping.

Auto model addition

A new auto model is added, AutoModelForTextEncoding. It is to be used when you want to extract the text encoder from an encoder-decoder architecture.

Model deprecation

Transformers is growing a lot and to ease a bit the burden of maintenance on our side, we have taken the decision to deprecate models that are not used a lot. Those models will never actually disappear from the library, but we will stop testing them or accepting PRs modifying them.
(enfin ça
The criteria to identify models to deprecate was less than 1,000 unique downloads in the last 30 days for models that are at least one year old. The list of deprecated models is:

  • BORT
  • M-CTC-T
  • MMBT
  • RetriBERT
  • TAPEX
  • Trajectory Transformer
  • VAN

Breaking changes

Fixes an issue with stripped spaces for the T5 family tokenizers. If this impacts negatively inference/training with your models, please let us know by opening an issue.

Bugfixes and improvements

  • add trust_remote_code option to CLI download cmd by @radames in #24097

  • Fix typo in Llama docstrings by @Kh4L in #24020

  • Avoid GPT-2 daily CI job OOM (in TF tests) by @ydshieh in #24106

  • [Lllama] Update tokenization code to ensure parsing of the special tokens [core] by @ArthurZucker in #24042

  • PLAM => PaLM by @xingener in #24129

  • [bnb] Fix bnb config json serialization by @younesbelkada in #24137

  • Correctly build models and import call_context for older TF versions by @Rocketknight1 in #24138

  • Generate: PT's top_p enforces min_tokens_to_keep when it is 1 by @gante in #24111

  • fix bugs with trainer by @pacman100 in #24134

  • Fix TF Rag OOM issue by @ydshieh in #24122

  • Fix SAM OOM issue on CI by @ydshieh in #24125

  • Fix XGLM OOM on CI by @ydshieh in #24123

  • [SAM] Fix sam slow test by @younesbelkada in #24140

  • [lamaTokenizerFast] Update documentation by @ArthurZucker in #24132

  • [BlenderBotSmall] Update doc example by @ArthurZucker in #24092

  • Fix Pipeline CI OOM issue by @ydshieh in #24124

  • [documentation] grammatical fixes in image_classification.mdx by @LiamSwayne in #24141

  • Fix typo in streamers.py by @freddiev4 in #24144

  • [tests] fix bitsandbytes import issue by @stas00 in #24151

  • Avoid OOM in doctest CI by @ydshieh in #24139

  • Fix Wav2Vec2 CI OOM by @ydshieh in #24190

  • Fix push to hub by @NielsRogge in #24187

  • Change ProgressCallback to use dynamic_ncols=True by @gmlwns2000 in #24101

  • [i18n]Translated "attention.mdx" to korean by @kihoon71 in #23878

  • Generate: force caching on the main model, in assisted generation by @gante in #24177

  • Fix device issue in OpenLlamaModelTest::test_model_parallelism by @ydshieh in #24195

  • Update GPTNeoXLanguageGenerationTest by @ydshieh in #24193

  • typo: fix typos in CONTRIBUTING.md and deepspeed.mdx by @zsj9509 in #24184

  • Generate: detect special architectures when loaded from PEFT by @gante in #24198

  • 🌐 [i18n-KO] Translated tasks_summary.mdx to Korean by @kihoon71 in #23977

  • 🚨🚨🚨 Replace DataLoader logic for Accelerate in Trainer, remove unneeded tests 🚨🚨🚨 by @muellerzr in #24028

  • Fix _load_pretrained_model by @SunMarc in #24200

  • Fix steps bugs in no trainer examples by @Ethan-yt in #24197

  • Skip RWKV test in past CI by @ydshieh in #24204

  • Remove unnecessary aten::to overhead in llama by @fxmarty in #24203

  • Update WhisperForAudioClassification doc example by @ydshieh in #24188

  • Finish dataloader integration by @muellerzr in #24201

  • Add the number of model test failures to slack CI report by @ydshieh in #24207

  • fix: TextIteratorStreamer cannot work with pipeline by @yuanwu2017 in #23641

  • Update (TF)SamModelIntegrationTest by @ydshieh in #24199

  • Improving error message when using use_safetensors=True. by @Narsil in #24232

  • Safely import pytest in testing_utils.py by @amyeroberts in #24241

  • fix overflow when training mDeberta in fp16 by @sjrl in #24116

  • deprecate use_mps_device by @pacman100 in #24239

  • Tied params cleanup by @sgugger in #24211

  • [Time Series] use mean scaler when scaling is a boolean True by @kashif in #24237

  • TF: standardize test_model_common_attributes for language models by @gante in #23457

  • Generate: GenerationConfig can overwrite attributes at from_pretrained time by @gante in #24238

  • Add torch >=1.12 requirement for Tapas by @ydshieh in #24251

  • Update urls in warnings for rich rendering by @IvanReznikov in #24136

  • Fix how we detect the TF package by @Rocketknight1 in #24255

  • Stop storing references to bound methods via tf.function by @Rocketknight1 in #24146

  • Skip GPT-J fx tests for torch < 1.12 by @ydshieh in #24256

  • docs wrt using accelerate launcher with trainer by @pacman100 in #24250

  • update FSDP save and load logic by @pacman100 in #24249

  • Fix URL in comment for contrastive loss function by @taepd in #24271

  • QA doc: import torch before it is used by @ByronHsu in #24228

  • Skip some TQAPipelineTests tests in past CI by @ydshieh in #24267

  • TF: CTRL with native embedding layers by @gante in #23456

  • Adapt Wav2Vec2 conversion for MMS lang identification by @patrickvonplaten in #24234

  • Update check of core deps by @sgugger in #24277

  • Pix2StructImageProcessor requires torch>=1.11.0 by @ydshieh in #24270

  • Fix Debertav2 embed_proj by @WissamAntoun in #24205

  • Clean up old Accelerate checks by @sgugger in #24279

  • Fix bug in slow tokenizer conversion, make it a lot faster by @stephantul in #24266

  • Fix check_config_attributes: check all configuration classes by @ydshieh in #24231

  • Fix LLaMa beam search when using parallelize by @FeiWang96 in #24224

  • remove unused is_decoder parameter in DetrAttention by @JayL0321 in #24226

  • Split common test from core tests by @sgugger in #24284

  • [fix] bug in BatchEncoding.getitem by @flybird1111 in #24293

  • Fix image segmentation tool bug by @amyeroberts in #23897

  • [Docs] Improve docs for MMS loading of other languages by @patrickvonplaten in #24292

  • Update README_zh-hans.md by @CooperFu in #24181

  • deepspeed init during eval fix by @pacman100 in #24298

  • [EnCodec] Changes for 32kHz ckpt by @sanchit-gandhi in #24296

  • [Docs] Fix the paper URL for MMS model by @hitchhicker in #24302

  • Update tokenizer_summary.mdx (grammar) by @belladoreai in #24286

  • Beam search type by @jprivera44 in #24288

  • Make can_generate as class method by @ydshieh in #24299

  • Update test versions on README.md by @sqali in #24307

  • [SwitchTransformers] Fix return values by @ArthurZucker in #24300

  • Fix functional TF Whisper and modernize tests by @Rocketknight1 in #24301

  • Big TF test cleanup by @Rocketknight1 in #24282

  • Fix ner average grouping with no groups by @Narsil in #24319

  • Fix ImageGPT doc example by @amyeroberts in #24317

  • Add test for proper TF input signatures by @Rocketknight1 in #24320

  • Adding ddp_broadcast_buffers argument to Trainer by @TevenLeScao in #24326

  • error bug on saving distributed optim state when using data parallel by @xshaun in #24108

  • 🌐 [i18n-KO] Fixed tutorial/preprocessing.mdx by @sim-so in #24156

  • pin apex to a speicifc commit (for DeepSpeed CI docker image) by @ydshieh in #24351

  • byebye Hub connection timeout by @ydshieh in #24350

  • Clean up disk sapce during docker image build for transformers-pytorch-gpu by @ydshieh in #24346

  • Fix KerasMetricCallback: pass generate_kwargs even if use_xla_generation is False by @Kripner in #24333

  • Fix device issue in SwitchTransformers by @ydshieh in #24352

  • Update MMS integration docs by @vineelpratap in #24311

  • Make AutoFormer work with previous torch version by @ydshieh in #24357

  • Fix ImageGPT doctest by @amyeroberts in #24353

  • Fix link to documentation in Install from Source by @SoyGema in #24336

  • docs: add BentoML to awesome-transformers by @aarnphm in #24344

  • [Doc Fix] Fix model name path in the transformers doc for AutoClasses by @riteshghorse in #24329

  • Fix the order in GPTNeo's docstring by @qgallouedec in #24358

  • Respect explicitly set framework parameter in pipeline by @denis-ismailaj in #24322

  • Allow passing kwargs through to TFBertTokenizer by @Rocketknight1 in #24324

  • Fix resuming PeftModel checkpoints in Trainer by @llohann-speranca in #24274

  • TensorFlow CI fixes by @Rocketknight1 in #24360

  • Update tiny models for pipeline testing. by @ydshieh in #24364

  • [modelcard] add audio classification to task list by @sanchit-gandhi in #24363

  • [Whisper] Make tests faster by @sanchit-gandhi in #24105

  • Rename test to be more accurate by @sgugger in #24374

  • Add a check in ImageToTextPipeline._forward by @ydshieh in #24373

  • [Tokenizer doc] Clarification about add_prefix_space by @ArthurZucker in #24368

  • style: add BitsAndBytesConfig repr function by @aarnphm in #24331

  • Better test name and enable pipeline test for pix2struct by @ydshieh in #24377

  • Skip a tapas (tokenization) test in past CI by @ydshieh in #24378

  • [Whisper Docs] Nits by @ArthurZucker in #24367

  • [GPTNeoX] Nit in config by @ArthurZucker in #24349

  • [Wav2Vec2 - MMS] Correct directly loading adapters weights by @patrickvonplaten in #24335

  • Migrate doc files to Markdown. by @sgugger in #24376

  • Update deprecated torch.ger by @kit1980 in #24387

  • [docs] Fix NLLB-MoE links by @stevhliu in #24388

  • Add ffmpeg for doc_test_job on CircleCI by @ydshieh in #24397

  • byebye Hub connection timeout - Recast by @ydshieh in #24399

  • fix type annotation for debug arg by @Bearnardd in #24033

  • [Trainer] Fix optimizer step on PyTorch TPU by @cowanmeg in #24389

  • Fix gradient checkpointing + fp16 autocast for most models by @younesbelkada in #24247

  • Clean up dist import by @muellerzr in #24402

  • Check auto mappings could be imported via from transformers by @ydshieh in #24400

  • Remove redundant code from TrainingArgs by @muellerzr in #24401

  • Explicit arguments in from_pretrained by @ydshieh in #24306

  • [ASR pipeline] Check for torchaudio by @sanchit-gandhi in #23953

  • TF safetensors reduced mem usage by @Rocketknight1 in #24404

  • Skip test_conditional_generation_pt_pix2struct in Past CI (torch < 1.11) by @ydshieh in #24417

  • [bnb] Fix bnb serialization issue with new release by @younesbelkada in #24416

  • Revert "Fix gradient checkpointing + fp16 autocast for most models" by @younesbelkada in #24420

  • Fix save_cache version in config.yml by @ydshieh in #24419

  • Update RayTune doc link for Hyperparameter tuning by @JoshuaEPSamuel in #24422

  • TF CI fix for Segformer by @Rocketknight1 in #24426

  • Refactor hyperparameter search backends by @alexmojaki in #24384

  • Clarify batch size displayed when using DataParallel by @sgugger in #24430

  • Save site-packages as cache in CircleCI job by @ydshieh in #24424

  • [llama] Fix comments in weights converter by @weimingzha0 in #24436

  • [Trainer] Fix .to call on 4bit models by @younesbelkada in #24444

  • fix the grad_acc issue at epoch boundaries by @pacman100 in #24415

  • Replace python random with torch.rand to enable dynamo.export by @BowenBao in #24434

  • Fix typo by @siryuon in #24440

  • Fix some TFWhisperModelIntegrationTests by @ydshieh in #24428

  • fixes issue when saving fsdp via accelerate's FSDP plugin by @pacman100 in #24446

  • Allow dict input for audio classification pipeline by @sanchit-gandhi in #23445

  • Update JukeboxConfig.from_pretrained by @ydshieh in #24443

  • Improved keras imports by @Rocketknight1 in #24448

  • add missing alignment_heads to Whisper integration test by @hollance in #24487

  • Fix tpu_metrics_debug by @cowanmeg in #24452

  • Update AlbertModel type annotation by @amyeroberts in #24450

  • [pipeline] Fix str device issue by @younesbelkada in #24396

  • when resume from peft checkpoint, the model should be trainable by @sywangyi in #24463

  • deepspeed z1/z2 state dict fix by @pacman100 in #24489

  • Update InstructBlipModelIntegrationTest by @ydshieh in #24490

  • Update token_classification.md by @condor-cp in #24484

  • Add support for for loops in python interpreter by @sgugger in #24429

  • [InstructBlip] Add accelerate support for instructblip by @younesbelkada in #24488

  • Compute dropout_probability only in training mode by @ydshieh in #24486

  • Fix 'local_rank' AttiributeError in Trainer class by @mocobeta in #24297

  • Compute dropout_probability only in training mode (SpeechT5) by @ydshieh in #24498

  • Fix link in utils by @SoyGema in #24501

  • 🚨🚨 Fix group beam search by @hukuda222 in #24407

  • Generate: group_beam_search requires diversity_penalty>0.0 by @gante in #24456

  • Generate: min_tokens_to_keep has to be >= 1 by @gante in #24453

  • Fix TypeError: Object of type int64 is not JSON serializable by @xiaoli in #24340

  • Fix poor past ci by @ydshieh in #24485

  • 🌐 [i18n-KO] Translated tflite.mdx to Korean by @0525hhgus in #24435

  • use accelerate autocast in jit eval path, since mix precision logic is… by @sywangyi in #24460

  • Update hyperparameter_search.py by @pacman100 in #24515

  • [T5] Add T5ForQuestionAnswering and MT5ForQuestionAnswering by @sjrl in #24481

  • set model to training mode before accelerate.prepare by @sywangyi in #24520

  • Update huggingface_hub commit sha by @ydshieh in #24527

  • Find module name in an OS-agnostic fashion by @sgugger in #24526

  • Fix LR scheduler based on bs from auto bs finder by @muellerzr in #24521

  • [Mask2Former] Remove SwinConfig by @NielsRogge in #24259

  • Allow backbones not in backbones_supported - Maskformer Mask2Former by @amyeroberts in #24532

  • Fix Typo by @tony9402 in #24530

  • Finishing tidying keys to ignore on load by @sgugger in #24535

  • Add bitsandbytes support for gpt2 models by @DarioSucic in #24504

  • ⚠️ Time to say goodbye to py37 by @ydshieh in #24091

  • Unpin DeepSpeed and require DS >= 0.9.3 by @ydshieh in #24541

  • Allow for warn_only selection in enable_full_determinism by @Frank995 in #24496

  • Fix typing annotations for FSDP and DeepSpeed in TrainingArguments by @mryab in #24549

  • Update PT/TF weight conversion after #24030 by @ydshieh in #24547

  • Update EncodecIntegrationTest by @ydshieh in #24553

  • [gpt2-int8] Add gpt2-xl int8 test by @younesbelkada in #24543

  • Fix processor init bug if image processor undefined by @amyeroberts in #24554

  • [InstructBlip] Add instruct blip int8 test by @younesbelkada in #24555

  • Update PT/Flax weight conversion after #24030 by @ydshieh in #24556

  • Make PT/Flax tests could be run on GPU by @ydshieh in #24557

  • Update masked_language_modeling.md by @condor-cp in #24560

  • Fixed OwlViTModel inplace operations by @pasqualedem in #24529

  • Update old existing feature extractor references by @amyeroberts in #24552

  • Fix Typo by @tony9402 in #24559

  • Fix annotations by @tony9402 in #24571

  • Docs: 4 bit doc corrections by @gante in #24572

  • Revert "Fix typing annotations for FSDP and DeepSpeed in TrainingArguments" by @sgugger in #24574

  • Update some torchscript tests after #24505 by @ydshieh in #24566

  • Removal of deprecated vision methods and specify deprecation versions by @amyeroberts in #24570

  • Fix ESM models buffers by @sgugger in #24576

  • Check all objects are equally in the main __init__ file by @ydshieh in #24573

  • Fix annotations by @tony9402 in #24582

  • fix peft ckpts not being pushed to hub by @pacman100 in #24578

  • Udate link to RunHouse hardware setup documentation. by @BioGeek in #24590

  • Show a warning for missing attention masks when pad_token_id is not None by @hackyon in #24510

  • Make (TF) CI faster (test only a subset of model classes) by @ydshieh in #24592

  • Speed up TF tests by reducing hidden layer counts by @Rocketknight1 in #24595

  • [several models] improve readability by @stas00 in #24585

  • Use protobuf 4 by @ydshieh in #24599

  • Limit Pydantic to V1 in dependencies by @lig in #24596

  • 🌐 [i18n-KO] Translated perplexity.mdx to Korean by @HanNayeoniee in #23850

  • [Time-Series] Added blog-post to tips by @elisim in #24482

  • Pin Pillow for now by @ydshieh in #24633

  • Fix loading dataset docs link in run_translation.py example by @SoyGema in #24594

  • Generate: multi-device support for contrastive search by @gante in #24635

  • Generate: force cache with inputs_embeds forwarding by @gante in #24639

  • precompiled_charsmap checking before adding to the normalizers' list for XLNetTokenizerFast conversion. by @shahad-mahmud in #24618

  • Fix audio feature extractor deps by @sanchit-gandhi in #24636

  • llama fp16 torch.max bug fix by @prathikr in #24561

  • documentation_tests.txt - sort filenames alphabetically by @amyeroberts in #24647

  • Update warning messages reffering to post_process_object_detection by @rafaelpadilla in #24649

  • Add finetuned_from property in the autogenerated model card by @sgugger in #24528

  • Make warning disappear for remote code in pipelines by @sgugger in #24603

  • Fix EncodecModelTest::test_multi_gpu_data_parallel_forward by @ydshieh in #24663

  • Fix VisionTextDualEncoderIntegrationTest by @ydshieh in #24661

  • Add is_torch_mps_available function to utils by @NripeshN in #24660

  • Unpin huggingface_hub by @ydshieh in #24667

  • Fix model referenced and results in documentation. Model mentioned was inaccessible by @rafaelpadilla in #24609

  • Add Nucleotide Transformer notebooks and restructure notebook list by @Rocketknight1 in #24669

  • LlamaTokenizer should be picklable by @icyblade in #24681

  • Add dropouts to GPT-NeoX by @ZHAOTING in #24680

  • DeepSpeed/FSDP ckpt saving utils fixes and FSDP training args fixes by @pacman100 in #24591

  • Avoid import sentencepiece_model_pb2 in utils.__init__.py by @ydshieh in #24689

  • Fix integration with Accelerate and failing test by @muellerzr in #24691

  • [MT5] Fix CONFIG_MAPPING issue leading it to load umt5 class by @ArthurZucker in #24678

  • Fix flaky test_for_warning_if_padding_and_no_attention_mask by @ydshieh in #24706

  • Whisper: fix prompted max length by @gante in #24666

  • Enable conversational pipeline for GPTSw3Tokenizer by @saattrupdan in #24648

  • [T5] Adding model_parallel = False to T5ForQuestionAnswering and MT5ForQuestionAnswering by @sjrl in #24684

  • Docs: change some input_ids doc reference from BertTokenizer to AutoTokenizer by @gante in #24730

  • add link to accelerate doc by @SunMarc in #24601

  • [Patch-t5-tokenizer] Patches the changes on T5 to make sure previous behaviour is still valide for beginning of words by @ArthurZucker in #24622

  • Fix typo in LocalAgent by @jamartin9 in #24736

  • fix: Text splitting in the BasicTokenizer by @connor-henderson in #22280

  • Docs: add kwargs type to fix formatting by @gante in #24733

  • add gradient checkpointing for distilbert by @jordane95 in #24719

  • Skip keys not in the state dict when finding mismatched weights by @sgugger in #24749

  • Fix non-deterministic Megatron-LM checkpoint name by @janEbert in #24674

  • [InstructBLIP] Fix bos token of LLaMa checkpoints by @NielsRogge in #24492

  • Skip some slow tests for doctesting in PRs (Circle)CI by @ydshieh in #24753

  • Fix lr scheduler not being reset on reruns by @muellerzr in #24758

  • 🐛 Handle empty gen_kwargs for seq2seq trainer prediction_step function by @gkumbhat in #24759

  • Allow existing configs to be registered by @sgugger in #24760

  • Unpin protobuf in docker file (for daily CI) by @ydshieh in #24761

  • Fix eval_accumulation_steps leading to incorrect metrics by @muellerzr in #24756

  • Add MobileVitV2 to doctests by @amyeroberts in #24771

  • Docs: Update logit processors call docs by @gante in #24729

  • Replacement of 20 asserts with exceptions by @Baukebrenninkmeijer in #24757

  • Update default values of bos/eos token ids in CLIPTextConfig by @ydshieh in #24773

  • Fix pad across processes dim in trainer and not being able to set the timeout by @muellerzr in #24775

  • gpt-bigcode: avoid zero_ to support Core ML by @pcuenca in #24755

  • Remove WWT from README by @LysandreJik in #24672

  • Rm duplicate pad_across_processes by @muellerzr in #24780

  • Revert "Unpin protobuf in docker file (for daily CI)" by @ydshieh in #24800

  • Removing unnecessary device=device in modeling_llama.py by @Liyang90 in #24696

  • [fix] Change the condition of ValueError in "convert_checkpoint_from_transformers_to_megatron" by @SeongBeomLEE in #24769

  • [DOC] Clarify relationshi load_best_model_at_end and save_total_limit by @BramVanroy in #24614

  • Upgrade jax/jaxlib/flax pin versions by @ydshieh in #24791

  • Fix MobileVitV2 doctest checkpoint by @amyeroberts in #24805

  • Skip torchscript tests for MusicgenForConditionalGeneration by @ydshieh in #24782

  • Generate: add SequenceBiasLogitsProcessor by @gante in #24334

  • Add accelerate version in transformers-cli env by @amyeroberts in #24806

  • Fix typo 'submosules' by @dymil in #24809

  • Remove Falcon docs for the release until TGI is ready by @Rocketknight1 in #24808

  • Update setup.py to be compatible with pipenv by @georgiemathews in #24789

  • Use _BaseAutoModelClass's register method by @fadynakhla in #24810

  • Run hub tests by @sgugger in #24807

  • Copy code when using local trust remote code by @sgugger in #24785

  • Fixing double use_auth_token.pop (preventing private models from being visible). by @Narsil in #24812

  • set correct model input names for gptsw3tokenizer by @DarioSucic in #24788

  • Check models used for common tests are small by @sgugger in #24824

  • [🔗 Docs] Fixed Incorrect Migration Link by @kadirnar in #24793

  • deprecate sharded_ddp training argument by @statelesshz in #24825

  • 🌐 [i18n-KO] Translated custom_tools.mdx to Korean by @sim-so in #24580

  • Remove unused code in GPT-Neo by @namespace-Pt in #24826

  • Add Multimodal heading and Document question answering in task_summary.mdx by @y3sar in #23318

  • Fix is_vision_available by @ydshieh in #24853

  • Fix comments for _merge_heads by @bofenghuang in #24855

  • fix broken links in READMEs by @younesbelkada in #24861

  • Add TAPEX to the list of deprecated models by @sgugger in #24859

  • Fix token pass by @sgugger in #24862

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @hollance
    • [WIP] add EnCodec model (#23655)
    • add word-level timestamps to Whisper (#23205)
    • add missing alignment_heads to Whisper integration test (#24487)
  • @sim-so
    • 🌐 [i18n-KO] Fixed tutorial/preprocessing.mdx (#24156)
    • 🌐 [i18n-KO] Translated custom_tools.mdx to Korean (#24580)
  • @novice03
    • Add Multi Resolution Analysis (MRA) (New PR) (#24513)
  • @jegork

Don't miss a new transformers release

NewReleases is sending notifications on new releases.