github huggingface/transformers v4.42.0
v4.42.0: Gemma 2, RTDETR, InstructBLIP, LLAVa Next, New Model Adder

latest releases: v4.42.3, v4.42.2, v4.42.1...
8 days ago

New model additions

Gemma-2

The Gemma2 model was proposed in Gemma2: Open Models Based on Gemini Technology and Research by Gemma2 Team, Google.
Gemma2 models are trained on 6T tokens, and released with 2 versions, 2b and 7b.

The abstract from the paper is the following:

This work introduces Gemma2, a new family of open language models demonstrating strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma2 outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of our model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations

image

RTDETR

The RT-DETR model was proposed in DETRs Beat YOLOs on Real-time Object Detection by Wenyu Lv, Yian Zhao, Shangliang Xu, Jinman Wei, Guanzhong Wang, Cheng Cui, Yuning Du, Qingqing Dang, Yi Liu.

RT-DETR is an object detection model that stands for “Real-Time DEtection Transformer.” This model is designed to perform object detection tasks with a focus on achieving real-time performance while maintaining high accuracy. Leveraging the transformer architecture, which has gained significant popularity in various fields of deep learning, RT-DETR processes images to identify and locate multiple objects within them.

image

InstructBlip

The InstructBLIP model was proposed in InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the BLIP-2 architecture for visual instruction tuning.

InstructBLIP uses the same architecture as BLIP-2 with a tiny but important difference: it also feeds the text prompt (instruction) to the Q-Former.

image

LlaVa NeXT Video

The LLaVa-NeXT-Video model was proposed in LLaVA-NeXT: A Strong Zero-shot Video Understanding Model by Yuanhan Zhang, Bo Li, Haotian Liu, Yong Jae Lee, Liangke Gui, Di Fu, Jiashi Feng, Ziwei Liu, Chunyuan Li. LLaVa-NeXT-Video improves upon LLaVa-NeXT by fine-tuning on a mix if video and image dataset thus increasing the model’s performance on videos.

LLaVA-NeXT surprisingly has strong performance in understanding video content in zero-shot fashion with the AnyRes technique that it uses. The AnyRes technique naturally represents a high-resolution image into multiple images. This technique is naturally generalizable to represent videos because videos can be considered as a set of frames (similar to a set of images in LLaVa-NeXT). The current version of LLaVA-NeXT makes use of AnyRes and trains with supervised fine-tuning (SFT) on top of LLaVA-Next on video data to achieves better video understanding capabilities.The model is a current SOTA among open-source models on VideoMME bench.

New model adder

A very significant change makes its way within the transformers codebase, introducing a new way to add models to transformers. We recommend reading the description of the PR below, but here is the gist of it:

The diff_converter tool is here to replace our old Copied from statements, while keeping our core transformers philosophy:

  • single model single file
  • explicit code
  • standardization of modeling code
  • readable and educative code
  • simple code
  • least amount of modularity

This additionally unlocks the ability to very quickly see the differences between new architectures that get developed. While many architectures are similar, the "single model, single file" policy can obfuscate the changes. With this diff converter, we want to make the changes between architectures very explicit.

Tool-use and RAG model support

We've made major updates to our support for tool-use and RAG models. We can now automatically generate JSON schema descriptions for Python functions which are suitable for passing to tool models, and we've defined a standard API for tool models which should allow the same tool inputs to be used with many different models. Models will need updates to their chat templates to support the new API, and we're targeting the Nous-Hermes, Command-R and Mistral/Mixtral model families for support in the very near future. Please see the updated chat template docs for more information.

If you are the owner of a model that supports tool use, but you're not sure how to update its chat template to support the new API, feel free to reach out to us for assistance with the update, for example on the Hugging Face Discord server. Ping Matt and yell key phrases like "chat templates" and "Jinja" and your issue will probably get resolved.

GGUF support

We further the support of GGUF files to offer fine-tuning within the python/HF ecosystem, before converting them back to the GGUF/GGML/llama.cpp libraries.

Trainer improvements

A new optimizer is added in the Trainer.

Quantization improvements

Several improvements are done related to quantization: a new cache (the quantized KV cache) is added, offering the ability to convert the cache of generative models, further reducing the memory requirements.

Additionally, the documentation related to quantization is entirely redone with the aim of helping users choose which is the best quantization method.

Examples

New instance segmentation examples are added by @qubvel

Notable improvements

As a notable improvement to the HF vision models that leverage backbones, we enable leveraging HF pretrained model weights as backbones, with the following API:

from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation

config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=True)
model = MaskFormerForInstanceSegmentation(config)

Additionally, we thank @Cyrilvallez for diving into our generate method and greatly reducing the memory requirements.

Breaking changes

Remove ConversationalPipeline and Conversation object

Both the ConversationalPipeline and the Conversation object have been deprecated for a while, and are due for removal in 4.42, which is the upcoming version.

The TextGenerationPipeline is recommended for this use-case, and now accepts inputs in the form of the OpenAI API.

Remove an accidental duplicate softmax application in FLAVA's attention

Removes duplicate softmax application in FLAVA attention. Likely to have a small change on the outputs but flagging with 🚨 as it will change a bit.

Idefics2's ignore_index attribute of the loss is updated to -100

out_indices from timm being updated

Recent updates to timm changed the type of the attribute model.feature_info.out_indices. Previously, out_indices would reflect the input type of out_indices on the create_model call i.e. either tuple or list. Now, this value is always a tuple.

As list are more useful and consistent for us -- we cannot save tuples in configs, they must be converted to lists first -- we instead choose to cast out_indices to always be a list.

This has the possibility of being a slight breaking change if users are creating models and relying on out_indices on being a tuple. As this property only happens when a new model is created, and not if it's saved and reloaded (because of the config), then I think this has a low chance of having much of an impact.

datasets referenced in the quantization config get updated to remove references to datasets with restrictive licenses.

Bugfixes and improvements

  • Add fixed resize and pad strategy for object detection by @qubvel in #30742
  • Enable dynamic resolution input for Swin Transformer and variants by @the-neural-networker in #30656
  • Add TokenClassification for Mistral, Mixtral and Qwen2 by @josephenguehard in #29878
  • FIX / Quantization: Fix Dockerfile build by @younesbelkada in #30890
  • Add support for torch.compile dynamic shapes by @warner-benjamin in #30560
  • LLaVa-Next: Update docs with batched inference by @zucchini-nlp in #30857
  • DeformableDETR two stage support bfloat16 by @DonggeunYu in #30907
  • add return_token_timestamps to WhisperProcessor by @kamilakesbi in #30812
  • Fix num_hidden_layers in initialization of new model in Mamba by @SrGonao in #30403
  • separate kwargs in processor (similar to #30193) by @Eric2i in #30905
  • fix for custom pipeline configuration by @not-lain in #29004
  • Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM by @ylacombe in #28706
  • Fix a shape annotation and typos in mamba slow forward by @vasqu in #30691
  • tokenizer_class = "AutoTokenizer" Llava Family by @ArthurZucker in #30912
  • Introduce configured_state arg for accelerator_config by @muellerzr in #29781
  • Add torch.compile for Mistral by @zhenglongjiepheonix in #30642
  • [docs] Spanish translation of model_memory_anatomy.md by @aaronjimv in #30885
  • FIX / TST: Fix expected results on Mistral slow test (A10) by @younesbelkada in #30909
  • PaliGemma - fix processor with no input text by @hiyouga in #30916
  • CI: AMD MI300 tests fix by @mht-sharma in #30797
  • Enforce saving at end of training if saving option chosen by @muellerzr in #30160
  • fix: center_crop occasionally outputs off-by-one dimension matrix by @mattlbeck in #30934
  • [Benchmark] Reuse optimum-benchmark by @ydshieh in #30615
  • TST / Workflows: Get slack notifications for docker image build by @younesbelkada in #30891
  • Fix swin embeddings interpolation by @amyeroberts in #30936
  • Fix inhomogeneous shape error in example by @Zantares in #30434
  • update ruff version by @ArthurZucker in #30932
  • Update build ci image [push-ci-image] by @ArthurZucker in #30933)
  • Update video-llava docs by @zucchini-nlp in #30935
  • Fix low cpu mem usage tests by @SunMarc in #30808
  • [doc] Add references to the fine-tuning blog and distil-whisper to Whisper. by @Vaibhavs10 in #30938
  • Avoid extra chunk in speech recognition by @jonatanklosko in #29539
  • [whisper] only trigger forced ids warning once by @sanchit-gandhi in #30966
  • Paligemma - fix slow tests, add bf16 and f16 slow tests by @molbap in #30851
  • Finally fix the missing new model failure CI report by @ydshieh in #30968
  • legacy to init the slow tokenizer when converting from slow was wrong by @ArthurZucker in #30972
  • Generation: get special tokens from model config by @zucchini-nlp in #30899
  • [Whisper] Strip prompt before finding common subsequence by @sanchit-gandhi in #27836
  • Fix link in Pipeline documentation by @junhl in #30948
  • [Mistral and friends] Update MLP by @NielsRogge in #31057
  • Paligemma causal attention mask by @molbap in #30967
  • Update object detection with latest resize and pad strategies by @qubvel in #30955
  • Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size by @kamilakesbi in #30637
  • Push ci image by @ArthurZucker in #30982
  • test_custom_4d_attention_mask skip with sliding window attn by @poedator in #30833
  • Finish adding support for torch.compile dynamic shapes by @warner-benjamin in #30919
  • FIX / Docs: Minor changes in quantization docs by @younesbelkada in #30985
  • Fix accelerate failing tests by @SunMarc in #30836
  • [tests] add torch.use_deterministic_algorithms for XPU by @faaany in #30774
  • Add a check that warmup_setps is either 0 or >= 1 by @ymoslem in #30764
  • Update 4 MptIntegrationTests expected outputs by @ydshieh in #30989
  • [Port] TensorFlow implementation of Mistral by @ariG23498 in #29708
  • Remove deprecated properties in tokenization_nllb.py and tokenization_nllb_fast.py by @ymoslem in #29834
  • Bugfix: WandbCallback uploads initial model checkpoint by @mgerstgrasser in #30897
  • add prefix space ignored in llama #29625 by @itazap in #30964
  • Fix training speed regression introduced by "optimize VRAM for calculating pos_bias in LayoutLM v2, v3 by @kkoehncke in #26139)"
  • Do not trigger autoconversion if local_files_only by @Wauplin in #31004
  • pin uv==0.1.45 by @ydshieh in #31006
  • Perceiver interpolate position embedding by @g1y5x3 in #30979
  • [tests] make test_model_parallelism device-agnostic by @faaany in #30844
  • FIX / TST: Fix expected results on Mistral AWQ test by @SunMarc in #30971
  • allow multi-gpu by @ydshieh in #31011
  • Fix resume_download future warning by @Wauplin in #31007
  • Quantization / TST: Fix remaining quantization tests by @younesbelkada in #31000
  • save the list of new model failures by @ydshieh in #31013
  • added interpolation for vitmae model in pytorch as well as tf. by @bhuvanmdev in #30732
  • Add split special tokens by @itazap in #30772
  • Paligemma- fix devices and dtype assignments by @molbap in #31008
  • Redirect transformers_agents doc to agents by @aymeric-roucher in #31054
  • unpin uv by @ydshieh in #31055
  • Follow up: Fix link in dbrx.md by @eitanturok in #30514
  • Update feature request label in template by @amyeroberts in #30940
  • Fix quanto tests by @SunMarc in #31062
  • Fix pad_to_max_length Whisper by @ylacombe in #30787
  • skip test_model_parallelism for 2 model test classes by @ydshieh in #31067
  • use @main by @ydshieh in #31065
  • Remove ninja from docker image build by @ydshieh in #31080
  • fix "piano" typo by @clinty in #31027
  • Update quicktour.md to fix broken link to Glossary by @apalkk in #31072
  • Remove redundant backend checks in training_args.py by @kevint324 in #30999
  • fix from_pretrained in offline mode when model is preloaded in cache by @oOraph in #31010
  • Remove float64 cast for OwlVit and OwlV2 to support MPS device by @qubvel in #31071
  • Fix OWLv2 post_process_object_detection for multiple images by @qubvel in #31082
  • Fix typo in trainer.py by @taslimisina in #31048
  • [SuperPoint, PaliGemma] Update docs by @NielsRogge in #31025
  • Fix failing tokenizer tests by @LysandreJik in #31083
  • Watermark: fix tests by @zucchini-nlp in #30961
  • Docs / PEFT: Add PEFT API documentation by @younesbelkada in #31078
  • Render chat template tojson filter as unicode by @CISC in #31041
  • FIX: Add accelerate as a hard requirement by @younesbelkada in #31090
  • FIX / OPT: Fix OPT multi-GPU training for OPTForQuestionAnswering by @younesbelkada in #31092
  • skip test_multi_gpu_data_parallel_forward for vit and deit by @ydshieh in #31086
  • Fix PretrainedConfig docstring with deprecated resume_download by @albertvillanova in #31014
  • Fix DeepSpeed compatibility with weight_norm by @jonnyli1125 in #30881)
  • TST: Fix instruct-blip tests by @younesbelkada in #31088
  • Docs / Quantization: Redirect deleted page by @younesbelkada in #31063
  • Deprecate low use models by @amyeroberts in #30781
  • Quantized KV cache: update quanto by @zucchini-nlp in #31052
  • FEAT: Add mistral v3 conversion script by @younesbelkada in #30981
  • Use HF_HUB_OFFLINE + fix has_file in offline mode by @Wauplin in #31016
  • Improve transformers-cli env reporting by @statelesshz in #31003
  • Fix env.py in cases where torch is not present by @Rocketknight1 in #31113
  • Fix faulty rstrip in module loading by @Rocketknight1 in #31108
  • Rm maintainer + migrate by @muellerzr in #31089
  • Fix nightly circleci by @ydshieh in #31114
  • FIX / Docs: Fix GPTQ expected number of bits by @younesbelkada in #31111
  • Add VLM generation default contributor by @gante in #31115
  • Add on_optimizer_step to callback options by @dhruvbpai in #31095
  • Cleanup docker build by @ydshieh in #31119
  • FIX / Quantization: Add extra validation for bnb config by @younesbelkada in #31135
  • fix get_scheduler when name is warmup_stable_decay by @zspo in #31128
  • Docs / Quantization: Replace all occurences of load_in_8bit with bnb config by @younesbelkada in #31136
  • Workflow: Remove IS_GITHUB_CI by @younesbelkada in #31147
  • helper by @ArthurZucker in #31152
  • pytest -rsfE by @ydshieh in #31140
  • Fix quantized cache output by @SunMarc in #31143
  • Update sam.md by @asifajrof in #31130
  • Quantization: Enhance bnb error message by @younesbelkada in #31160
  • [trainer] add sanity evaluation option by @SunMarc in #31146
  • Add streaming, various fixes by @aymeric-roucher in #30838
  • Added description of quantization_config by @vamsivallepu in #31133
  • Fix typo: use_safetenstors to use_safetensors by @CharlesCNorton in #31184
  • Remove copied froms for deprecated models by @amyeroberts in #31153
  • Token healing by @ahmed-moubtahij in #30081
  • [GemmaModel] fix small typo by @ArthurZucker in #31202
  • Fix Cannot convert [array()] to EagerTensor of dtype int64 by @pavi-ninjaac in #31109
  • Ignore non-causal mask in more cases with SDPA by @fxmarty in #30138
  • SlidingWindowCache: reduce differences to other Cache classes by @gante in #30970
  • Fix test_compile_static_cache by @ydshieh in #30991
  • fix the get_size_with_aspect_ratio in max_size situation by @SangbumChoi in #30902
  • Fix typo in utils by @Bojun-Feng in #31169
  • Rename sanity_evaluation to eval_on_start by @Qubitium in #31192
  • Wrong translation FR : Contents = Contenu by @jadechoghari in #31186
  • Cohere: Fix copied from by @younesbelkada in #31213
  • Set greater_is_better to False if metric_for_best_model ends with "loss" by @miivanov90 in #31142
  • Fix GPU OOM for mistral.py::Mask4DTestHard by @ydshieh in #31212
  • [docs] Spanish translation of tokenizer_summary.md by @aaronjimv in #31154
  • Pass device in Logits Processor's init by @zucchini-nlp in #29804
  • Fix sentence fragment within test comments by @DomHudson in #31218
  • fix(PatchTST): Wrong dropout used for PretainHead by @maxstrobel in #31117
  • Video-LLaVa: handle any number of frames by @zucchini-nlp in #31221
  • Add dynamic resolution input/interpolate position embedding to deit by @p-kris10 in #31131
  • fix bf16 issue in text classification pipeline by @chujiezheng in #30996
  • Fix pipeline tests - torch imports by @amyeroberts in #31227
  • Add new line switch before logging ***** Running {description} ***** by @jacklanda in #31225
  • add no split modules for xlmrobertaxl by @ManuelFay in #31223
  • Fix MistralIntegrationTest by @ydshieh in #31231
  • Blip: Deprecate BlipModel by @younesbelkada in #31235
  • Move out common backbone config param validation by @amyeroberts in #31144
  • Upload (daily) CI results to Hub by @ydshieh in #31168
  • Specify dtype=torch.bool to avoid xla error by @ysulsky in #31191
  • Fixing name 'torch' is not defined in bitsandbytes integration by @jamesbraza in #31243
  • Benchmark GitHub Actions workflow by @ydshieh in #31163
  • Early labels validation by @amyeroberts in #31240
  • doc: add info about wav2vec2 bert in older wav2vec2 models. by @Vaibhavs10 in #31120
  • enable deterministic mode for npu by @statelesshz in #31253
  • Add missing Flaubert tokenizer tests by @bastrob in #30492
  • Fix circular reference issue in CLIPTokenizerFast by @dhaivat1729 in #31075
  • Add condition to benchmark job in push-important-models.yml by @ydshieh in #31259
  • Skip failing JetMOE generation tests by @amyeroberts in #31266
  • no need for explicit EXTRA_TOKENS in processing_paligemma.py by @grahamannett in #31022
  • [SwitchTransformer] Significant performance improvement on MoE blocks by @ranggihwang in #31173
  • fix loading special_tokens_map_file by @ZhiyuanChen in #31012
  • Make mamba use cache by @zucchini-nlp in #31116
  • Generation: fix handling of special tokens by @zucchini-nlp in #31254
  • Switch from cached_download to hf_hub_download in remaining occurrences by @Wauplin in #31284
  • fix: str should be used not int when setting env variables by @statelesshz in #31272
  • Fix _save_tpu: use _maybe_convert_to_cpu instead of to cpu. by @baoleai in #31264
  • fix accelerate tests for roberta xl by @SunMarc in #31288
  • Enable dynamic resolution input for Beit by @OmarManzoor in #31053
  • Mark MobileNetV1ModelTest::test_batching_equivalence as flaky by @amyeroberts in #31258
  • Pipeline VQA: Add support for list of images and questions as pipeline input by @BlacCod in #31217
  • Fix SwinLayer / DonutSwinLayer / ClapAudioLayer attention mask device by @gorodnitskiy in #31295
  • Update text-to-speech.md by @jaguaryang in #31269
  • Fixed Wav2Vec2ProcessorWithLM decoding error by @karicotiza in #31188
  • Fix jetmoe model by @Cyrilvallez in #31279
  • Extend save_pretrained to offloaded models by @blbadger in #27412
  • Implement JSON dump conversion for torch_dtype in TrainingArguments by @junrae6454 in #31224
  • interpolation added for TVP. by @bhuvanmdev in #30863
  • Rename test_model_common_attributes -> test_model_get_set_embeddings by @amyeroberts in #31321
  • Use unused prepare_img() function in dinov2 conversion script by @IbrahimAmin1 in #31335
  • docs: fix style by @imba-tjd in #31340
  • Fix paligemma inverted mask by @molbap in #31207
  • docs/zh: fix style by @imba-tjd in #31334
  • Decorators for deprecation and named arguments validation by @qubvel in #30799
  • Improve error msg when using bitsandbytes by @SunMarc in #31350
  • Fix Cohere CI by @ydshieh in #31263
  • Fix gradio tool demos by @aymeric-roucher in #31230
  • Fast image processor by @amyeroberts in #28847
  • Add french translation of AutoBackbone by @jadechoghari in #31300
  • Add support to declare imports for code agent by @JasonZhu1313 in #31355
  • Fix idefics cache by @zucchini-nlp in #31377
  • [Bug Fix] Renamed loss to losses to suppress UnboundLocalError by @her0e1c1 in #31365
  • docs: fix broken link by @imba-tjd in #31370
  • backbone_utils - fix relative import by @amyeroberts in #31382
  • README underline between badges fix by @novialriptide in #31376
  • Update comment in modeling_utils.py by @inf3rnus in #31299
  • Use huggingface_hub helper function to split state dict by @SunMarc in #31091
  • Change JSON serialization to custom json.dumps by @junrae6454 in #31100
  • feat(ci): add trufflehog secrets detection by @McPatate in #31344
  • [QoL fix] [Image processing] Add warning on assumption of channel dim and avoid infering when inputs are PIL.Image by @aliencaocao in #31364
  • Make chat templates part of ProcessorMixin by @Rocketknight1 in #30744
  • add initial design for uniform processors + align model by @molbap in #31197
  • Add missing French translation of tutoriel_pipeline.md by @jadechoghari in #31396
  • Temporarily pin datasets upper version to fix CI by @albertvillanova in #31407
  • Support Clip QKV for MPT by @akakakakakaa in #31307
  • Pin datasets<2.20.0 for examples by @amyeroberts in #31417
  • Fix MusicGen SDPA by @ylacombe in #31208
  • Set seed for M4T retain grad test by @ylacombe in #31419
  • Fix SpeechT5 decoder_attention_mask shape by @ylacombe in #28071
  • Change potential inputs_embeds padding logger.warning to logger.warning_once by @naimenz in #31411
  • Remove duplicate image processor in auto map by @amyeroberts in #31383
  • Install the tensorflow example requirements in docker by @amyeroberts in #31428
  • Remove empty create_and_test_config_common_properties tests by @amyeroberts in #31359
  • xpu: support xpu backend from stock pytorch (>=2.4) by @dvrogozh in #31238
  • Musicgen special tokens in tensors by @zucchini-nlp in #31420
  • Fix Bark logits processors device misplacement by @ylacombe in #31416
  • Rename misnamed image processor test files by @amyeroberts in #31430
  • Generate: fix tokenizer being popped twice by @gante in #31427
  • [tests] make TestDeepSpeedModelZoo device-agnostic by @faaany in #31402
  • Support multiple validation datasets when dataloader_persistent_workers=True by @bastienlc in #30627
  • Pass datasets trust_remote_code by @albertvillanova in #31406
  • simple fix by @tokenizer-decode in #31456
  • Fix typing errors in Qwen2ForTokenClassification by @kevinhu in #31440
  • Agents: Improve python interpreter by @aymeric-roucher in #31409
  • Donut: fix generate call from local path by @gante in #31470
  • Make "tool_use" the default chat template key when tools are passed by @Rocketknight1 in #31429
  • Fix single letter stop strings by @Rocketknight1 in #31448
  • Update chat template docs and bump Jinja version by @Rocketknight1 in #31455
  • Improve PreTrainedTokenizerFast loading time when there are many added tokens by @ydshieh in #31404
  • Fix documentation typos by @qgallouedec in #31476
  • Give more useful metric_for_best_model errors by @tomaarsen in #31450
  • Update perf_train_gpu_many.md by @remyleone in #31451
  • [GPT2] Add SDPA support by @vasqu in #31172
  • Fix autocast incompatibility in RecurrentGemma by @xplip in #30832
  • Use self.config_tester.run_common_tests() by @amyeroberts in #31431
  • [tests] rename test_config_object to test_ds_config_object by @faaany in #31403
  • Docs / AQLM: Clarify torch.compile support for AQLM by @younesbelkada in #31473
  • Mamba: add generative tests by @gante in #31478
  • Update object_detection.md by @jajupmochi in #31488
  • Add docs on zeroshot image classification prompt templates by @aliencaocao in #31343
  • auto-detect device when no device is passed to pipeline by @faaany in #31398
  • Fix typo: pas_token_id by @ftnext in #30894
  • Fix wandb integration with SetFit model by @timothepearce in #30021
  • Consider inheritance in type checking for tensors by @daemyung in #31378
  • Add valid columns check in _remove_unused_columns method by @arthasking123 in #31466
  • Fix a teeny-tiny typo in tokenization_utils_base.py's docstring by @sadra-barikbin in #31510
  • Fix mismatched ` in doc & other common typos by @jhwei in #31516
  • RWKV: enable generation tests by @gante in #31490
  • unskip 2 tests in cohere by @ydshieh in #31517
  • Revive Nightly/Past CI by @ydshieh in #31159
  • Deprecate legacy cache + use cache position by @zucchini-nlp in #31491
  • SPLIT PR: add user defined symbols and control symbols by @itazap in #31305
  • Removed torch.cuda.empty_cache from train loop. by @FoamoftheSea in #31530
  • Update mask_generation.md by @nicholicaron in #31543
  • Correct @is_flaky test decoration by @qubvel in #31480
  • Add implementation of spectrogram_batch by @ravenouse in #27159
  • chore: fix typos by @xiaoxianBoy in #31559
  • Update git templates by @ArthurZucker in #31539
  • Fix the error caused by incorrect use of logger in pipeline by @lanyun1103 in #31565
  • Fix bug about add_special_tokens and so on by @hiroshi-matsuda-rit in #31496
  • Add Jinja as a requirement with the right version cutoff by @Rocketknight1 in #31536
  • Fix doc typo in TrainingArguments by @qgallouedec in #31503
  • Fix is_torch_xpu_available for torch < 2.3 by @amyeroberts in #31573
  • Added version constraint on numpy for version <2.0 by @Resteklicken in #31569
  • Siglip: add _no_split_module by @zucchini-nlp in #31566
  • fix output data type of image classification by @jiqing-feng in #31444
  • add preprocessing_num_workers to run_classification.py by @jiahuanluo in #31586
  • Improve error message for mismatched copies in code blocks by @molbap in #31535
  • Add ViTImageProcessorFast to tests by @amyeroberts in #31424
  • docs: move translations to i18n by @SauravMaheshkar in #31584
  • Removed unnecessary self.projection call in VivitTubeletEmbeddings by @v-iashin in #31632
  • [GPT-NeoX] Add SDPA support by @vasqu in #31031
  • Update RT-DETR code snippet by @qubvel in #31631
  • Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP by @younesbelkada in #31161
  • Fix RT-DETR inference with float16 and bfloat16 by @qubvel in #31639
  • Fix paligemma detection inference by @molbap in #31587
  • Generate: fix assisted generation with past_key_values passed as kwargs by @gante in #31644
  • Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference by @aliencaocao in #31589
  • Skip tests properly by @amyeroberts in #31308
  • Generation: past kv can be None by @zucchini-nlp in #31051
  • Fix ONNX exports for Optimum compatible models by @merveenoyan in #31311

Significant community contributions

The following contributors have made significant changes to the library over the last release:

Don't miss a new transformers release

NewReleases is sending notifications on new releases.