github huggingface/transformers v4.47.0
v4.47.0: PaliGemma-2, I-JEPA, OLMo-2, LayerSkip, Tensor Parallel

latest release: v4.47.1
14 days ago

New models

PaliGemma-2

PaliGemma 2 and PaliGemma are lightweight open vision-language models (VLM) inspired by PaLI-3, and based on open components like the SigLIP vision model and the Gemma language model. PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images.

PaliGemma 2 is available in 3B, 10B, and 28B parameter sizes, which are based on Gemma 2 2B, 9B, and 27B models, respectively. The original PaliGemma models are available in the 3B size. For more information on Gemma model variants, see the Gemma models list. PaliGemma model variants support different pixel resolutions for image inputs, including 224 x 224, 448 x 448, and 896 x 896 pixels.

image

I-JEPA

The I-JEPA model was proposed in Image-based Joint-Embedding Predictive Architecture by Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas. I-JEPA is a self-supervised learning method that predicts the representations of one part of an image based on other parts of the same image. This approach focuses on learning semantic features without relying on pre-defined invariances from hand-crafted data transformations, which can bias specific tasks, or on filling in pixel-level details, which often leads to less meaningful representations.

image

OLMo 2

image

The OLMo2 model is the successor of the OLMo model, which was proposed in OLMo: Accelerating the Science of Language Models.

The architectural changes from the original OLMo model to this model are:

  • RMSNorm is used instead of standard layer norm.
  • Norm is applied to attention queries and keys.
  • Norm is applied after attention/feedforward layers rather than before.

Commits:

Layer-Skip Llama

We add support for Meta's Layer-Skip Llama 3.2 1B model.

The Llama3.2 1B model was continually pretrained with LayerSkip recipe, early exit loss and layer dropout, as presented in Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding and is capable of performing self-speculative decoding: decode with earlier layers and verify with remaining layers.

image

Tensor Parallel implementation

This PR uses the torch.distributed.tensor.parallel subpackage to implement Tensor Parallel for Llama (as an example).

The motivation is multi-fold:

  1. to make modeling code simple as single-worker case:
    all manual TP implementations under if self.config.pretraining_tp > 1 can be removed.

  2. to make tensor parallelism easily accessible by users:
    added a model.tensor_parallel(device_mesh) method that allows users to turn a single-proc model into a parallel model. !- Please guide me to a right place to put this function/method if PreTrainedModel is not a preferred place. -!

This is the first PR of many to simplify and enable Tensor Parallel across models.

  • Simplify Tensor Parallel implementation with PyTorch TP by @kwen2501 in #34184

Farewell, Python 3.8

Python 3.8 reaches end of life, and, as such, we drop it from our CI.

GGUF improvements

Several improvements have been done to the GGUF support in transformers; notably by adding new architectures to the list of supported architectures.

Fast processors

We continue the work to improve the speed of fast processors as detailed in this roadmap.

We contribute a fast processor to RT-DETR.

New pipelines

A new pipeline has been added to transformers: image-text-to-text!

the pipeline support the following inputs:

  • unbatched images and text - images=image, text=text
  • batched images and text - images = [image, image], text= [text, text]
  • several images per prompt (only for models supporting the use of an image token) - images = [[image, image], [image]] or images=[image, image, image], text = ["... ......", "......"]
  • Chat templates (for models supporting them).

Notable refactors

Separate chat templates into a single file

We have had several issues with chat templates because they're stored as single lines in the JSON config files:

  • Impossible to review diffs
  • Very hard to edit in the web UI (or in general)
  • Differences between processor templates in chat_template.json and tokenizer templates in tokenizer_config.json causing confusion
  • Some models use multiple templates, requiring a template dict, but we're trying to discourage that in future and move those models to single templates with conditional behaviour instead

The solution:

  • Just move chat templates to a single chat_template.jinja file in the repo
  • If multiple templates are required, then they should still be stored in the JSON file. This is not supported for Processor classes, so processors should always be able to save their template as a raw Jinja file. In general, we'll be gently deprecating multiple templates in future.
  • If a chat_template.jinja file is present, it overrides the JSON files. If a tokenizer is loaded with both Jinja and JSON chat templates and resaved, it should save only the Jinja file, and not have any chat_template entry in tokenizer_config.json.

For now, we continue saving in the old format by default. I'll probably keep it this way for several versions before making the new format the default, to ensure that most users are able to load the new format before it becomes common. Until then, the new format should mostly be used for testing, to make sure it's ready for deployment when we do the switch.

Large modular logic refactor

This PR largely rework the logic we use in the modular converter. It is (hopefully) clearer and maintainable. Instead of going in all directions, adding stuff, then deleting it if not needed, we now do the following:

  • visit all the modular file (record imports/functions/classes/assignments nodes)
    • create function dependency mapping
  • for each import coming from another model:
    • visit the corresponding file
    • create function dependency mapping
    • update mapping with function/assignment from the modular (updated/new functions)
    • create the class dependency graph based on merged dependencies
  • update dependency graph of the modular with the functions and assignments imported from the other files
  • for each class recorded in the modular:
    • if inherithing from class in another file:
      • replace call to super
      • find the dependencies after the node was replaced
      • follow (updated with modular defs) dependency mapping to add all nodes
    • else:
      • only add needed imported functions (and their dependencies)
  • determine the needed imports and add them

Community bugfixes and improvements

  • Remove graph breaks for torch.compile() in flash_attention_forward when Lllama Model is padding free tuned by @Abhishek-TAMU in #33932
  • Better defaults by @ArthurZucker in #34026
  • translated gguf.md into chinese by @blueingman in #34163
  • CI: fix failures by @zucchini-nlp in #34371
  • Zamba is an LM by @LysandreJik in #34342
  • add code generation to natural language processing section by @furtnerthomas in #34333
  • Fix pil_torch_interpolation_mapping import in image_processing_detr_fast by @yonigozlan in #34375
  • Add code sample docstrings and checkpoint reference for GLM models by @h3110Fr13nd in #34360
  • refactor: remove redundant if-condition and improve type correctness for convert_tokens_to_ids by @winstxnhdw in #34030
  • Ignore unsupported kwarg in ProcessorMixin call by @yonigozlan in #34285
  • [PEFT] Add warning for missing key in LoRA adapter by @BenjaminBossan in #34068
  • Fix torch.fx issue related to the new loss_kwargs keyword argument by @michaelbenayoun in #34380
  • Correct the new defaults by @Cyrilvallez in #34377
  • [auto. ping] Avoid sending empty info + add more team members by @ydshieh in #34383
  • Fix glm by @Cyrilvallez in #34388
  • Use non nested images and batched text Idefics2/3 by @yonigozlan in #34222
  • Fix onnx non-expotable inplace aten op by @IlyasMoutawwakil in #34376
  • Fix right padding in LLaVA models by @zucchini-nlp in #34305
  • no filter by @ydshieh in #34391
  • SynthID: better example by @gante in #34372
  • Tests: upgrade test_eager_matches_sdpa_generate by @gante in #34386
  • Fix bnb training test failure by @matthewdouglas in #34414
  • Avoid check expected exception when it is on CUDA by @ydshieh in #34408
  • Fix typos in agents_advanced.md by @rudydel in #34405
  • [docs] Cache implementations by @stevhliu in #34325
  • Fix pix2struct by @IlyasMoutawwakil in #34374
  • pin tensorflow_probability<0.22 in docker files by @ydshieh in #34381
  • Tiny update after #34383 by @ydshieh in #34404
  • Fix batch size handling in prediction_loop for DataLoaderShard by @zeus2611 in #34343
  • exclude fsdp from delay_optimizer_creation by @eljandoubi in #34140
  • New option called "best" for args.save_strategy. by @seanswyi in #31817
  • [docs] update input documentation for MAMBA2 and MISTRAL models to include cache_position and attention_mask details by @h3110Fr13nd in #34322
  • 🌐 [i18n-KO] Translated model_doc/barthez.md to Korean by @Jwaminju in #33980
  • Apply linting to the important code blocks to make it readable by @ShubhamJagtap2000 in #34449
  • Torchao weights only + prequantized compability by @SunMarc in #34355
  • [i18n-ar] Translated file : docs/source/ar/fast_tokenizers.md into Arabic by @AhmedAlmaghz in #33034
  • enable average tokens across devices by @techkang in #34373
  • feat: run benchmarks on A100 by @McPatate in #34287
  • Add post_process_depth_estimation for GLPN by @alex-bene in #34413
  • LLaVA: latency issues by @zucchini-nlp in #34460
  • Generation: fix test by @zucchini-nlp in #34369
  • Fix CI by @zucchini-nlp in #34458
  • use a tinymodel to test generation config which aviod timeout by @techkang in #34482
  • 🚨🚨🚨 [SuperPoint] Fix keypoint coordinate output and add post processing by @sbucaille in #33200
  • Simplify running tests in a subprocess by @ydshieh in #34213
  • Fix perplexity computation in perplexity.md by @Framartin in #34387
  • Fixes for Modular Converter on Windows by @hlky in #34266
  • Fix regression loading dtype by @SunMarc in #34409
  • Bert is ExecuTorch compatible by @guangy10 in #34424
  • manual head_dim for mixtral model by @wavy-jung in #34281
  • fix-qwen2vl-no-position_ids by @simonJJJ in #33487
  • Bug fix for drop path decay rate in swin transformer by @abhi-glitchhg in #34291
  • MobileBERT is ExecuTorch compatible by @guangy10 in #34473
  • Albert is ExecuTorch compatible by @guangy10 in #34476
  • Adding optimizer_cls_and_kwargs to Trainer.__init__ by @apoorvkh in #34358
  • Fix performance in get_imports regexp by @AlekseyLobanov in #34298
  • fix incorrect warning by @yonigozlan in #34416
  • Un-deprecate timeout arg in pipelines by @Rocketknight1 in #34382
  • Roberta is ExecuTorch compatible by @guangy10 in #34425
  • Fix format mistake in string repr of tokenizer objects by @gpetho in #34493
  • Mllama: update docs by @zucchini-nlp in #34334
  • VLMs: fix number of image tokens by @zucchini-nlp in #34332
  • Tests: move generate tests to the right mixin and delete redundant tests by @gante in #34464
  • fix pixtral processor by @molbap in #34486
  • Use torch 2.5 in scheduled CI by @ydshieh in #34465
  • Fix super tiny extra space typo by @fzyzcjy in #34440
  • UPDATE Documentation for #TRANSLATING.md Documentation into Multiple Languages.(Changes made) by @anshumangahlot in #34226
  • enable QA bf16 pipeline by @jiqing-feng in #34483
  • Fix: img size mismatch caused by incorrect unpadding in LLaVA-Next by @jp1924 in #34522
  • Fix step shifting when accumulate gradient by @kibitzing in #33673
  • avoid calling gc.collect and cuda.empty_cache by @ydshieh in #34514
  • Qwen2VL: skip base input_ids-inputs_embeds equivalence check by @gante in #34535
  • fix(DPT,Depth-Anything) Address expected_slice errors inside inference tests by @philkuz in #34518
  • feat: add benchmarks pg indexes by @McPatate in #34536
  • make test_eager_matches_sdpa_inference less flaky by @ydshieh in #34512
  • Bug Fix for issue #34294 by @fpgaminer in #34295
  • [CLIPSeg] Make interpolate_pos_encoding default to True by @NielsRogge in #34419
  • update doc by @jiqing-feng in #34478
  • [i18n-ar] Translated file : docs/source/ar/multilingual.md into Arabic by @AhmedAlmaghz in #33048
  • Blip: get/set input embeddings correctly by @zucchini-nlp in #34152
  • BLIP: enable generation tests by @zucchini-nlp in #34174
  • πŸ”΄ πŸ”΄ fix query_pre_attn_scalar different of num_heads in default gemma2 config by @molbap in #34540
  • [i18n-HI] Translated accelerate page to Hindi by @karthik-script in #34443
  • Update trainer for easier handling of accumulate, compile fixes, and proper reporting by @muellerzr in #34511
  • VLM: special multimodal Tokenizer by @zucchini-nlp in #34461
  • MPS: isin_mps_friendly can support 0D tensors by @gante in #34538
  • Add text support to the Trainer's TensorBoard integration by @JacobLinCool in #34418
  • [i18n-HI] Translated TFLite page to Hindi by @karthik-script in #34572
  • 🌐 [i18n-KO] Translated perf_train_special.md to Korean by @maximizemaxwell in #34590
  • 🌐 [i18n-KO] Update README_ko.md by @J4BEZ in #33098
  • fix TrainerState doc because num_input_tokens_seen is unused by defau… by @techkang in #34593
  • Fix Whisper CI by @ydshieh in #34541
  • Skip DeepSpeed ZeRO Stage 3 model initialization when bnb by @eljandoubi in #34395
  • FIX: Broken repr of TorchAoConfig by @BenjaminBossan in #34560
  • Load sub-configs from composite configs by @zucchini-nlp in #34410
  • DistilBERT is ExecuTorch compatible by @guangy10 in #34475
  • Remove unused test_dataset by @thisisiron in #34516
  • Revert "Fix Whisper CI" by @ydshieh in #34605
  • Fix #34494 assistant tokens when truncated by @yonigottesman in #34531
  • Remove @slow for test_eager_matches_sdpa_inference by @ydshieh in #34558
  • Changing repr in torchao to show quantized Linear by @MekkCyber in #34202
  • Fix torchvision interpolation CI by @yonigozlan in #34539
  • 🌐 [i18n-KO] Translated convbert.md to Korean by @ahnjj in #34599
  • fix(dvclive): pass fake dataset to avoid exception in trainer init by @shcheklein in #34455
  • 🌐 [i18n-KO] Translated timesformer.md to Korean by @mreraser in #33972
  • 🌐 [i18n-KO] Translated bert.md to Korean by @maximizemaxwell in #34627
  • [i18n-ar] Translated file : docs/source/ar/trainer.md into Arabic by @AhmedAlmaghz in #33080
  • Update llm_engine.py by @louisbrulenaudet in #33332
  • Agents: turn any Space into a Tool with Tool.from_space() by @aymeric-roucher in #34561
  • [docs] update not-working model revision by @faaany in #34682
  • [i18n-ar] Translated file : docs/source/ar/torchscript.md into Arabic by @AhmedAlmaghz in #33079
  • Agents: Small fixes in streaming to gradio + add tests by @aymeric-roucher in #34549
  • 🌐 [i18n-KO] Translated marian.md to Korean by @maximizemaxwell in #34698
  • [docs] Broken link in generation_strategies by @pcuenca in #34717
  • Fix example in EsmConfig docstring by @yuanx749 in #34653
  • [docs] add xpu device check by @faaany in #34684
  • Retain newlines in chat template when continue_final_message=True by @lewtun in #34253
  • Update llava.md by @LysandreJik in #34749
  • fix(wandb): pass fake dataset to avoid exception in trainer (see #34455) by @CezaPasc in #34720
  • add xpu path for awq by @jiqing-feng in #34712
  • FSDP grad accum fix by @winglian in #34645
  • Remove FSDP wrapping from sub-models. by @eljandoubi in #34452
  • 🧼 remove v4.44 deprecations by @gante in #34245
  • VLMs: patch_size -> num_image_tokens in processing by @zucchini-nlp in #33424
  • Fix broken link by @ofek in #34618
  • fix a typo bug where 'id2label' was incorrectly written as 'i2label' when reading config by @ZuoChenFttS in #34637
  • Fix skip of test_training_gradient_checkpointing by @dvrogozh in #34723
  • make sure to disable gradients for integer tensor by @winglian in #32943
  • [docs] make empty_cache device-agnostic by @faaany in #34774
  • [docs] add XPU besides CUDA, MPS etc. by @faaany in #34777
  • [tests] add XPU part to testing by @faaany in #34778
  • fix: Update pixel_values parameter in hf_model input by @thisisiron in #34782
  • Fix callback key name by @jung-hunsoo in #34762
  • fix: Wrong task mentioned in docs by @ecyht2 in #34757
  • Allow handling files as args for a tool created with Tool.from_space by @aymeric-roucher in #34687
  • Fix Whisper CI by @ydshieh in #34617
  • protect tensor parallel usage by @ArthurZucker in #34800
  • Trainer hyperparameter search kwargs docs update by @GuillemGSubies in #34459
  • feat: allow to use hf-hub models for timm backbone by @cgebbe in #34729
  • Support gradient checkpointing in Qwen2VL ViT by @li-plus in #34724
  • Fix: siglip image processor rgb_convert is not being applied correctly. by @jp1924 in #34301
  • fix cpu bnb path by @jiqing-feng in #34647
  • Gemma capping by @ArthurZucker in #34282
  • Fix cache_utils for optimum.quanto kvcache quantization by @SunMarc in #34750
  • Modular fix by @Cyrilvallez in #34802
  • MLU devices : Checks if mlu is available via an cndev-based check which won't trigger the drivers and leave mlu by @huismiling in #34326
  • 🚨🚨🚨 fix(Mask2Former): torch export 🚨🚨🚨 by @philkuz in #34393
  • Feature: print tokens per second during training by @tibor-reiss in #34507
  • Add do_convert_rgb to vit by @jp1924 in #34523
  • Fix post process function called in the instance segmentation example of mask2former by @OnTheThirdDay in #34588
  • fix crash in tiiuae/falcon-11B-vlm image-to-text generation by @sywangyi in #34728
  • Add support for OpenAI api "image_url" input in chat for image-text-to-text pipeline by @yonigozlan in #34562
  • Add Image Processor Fast Deformable DETR by @yonigozlan in #34353
  • Run test_medium_seamless_m4t_pt in subprocess to avoid many failures by @ydshieh in #34812
  • Fix check_training_gradient_checkpointing by @ydshieh in #34806
  • Added image-text-to-text pipeline to task guide by @merveenoyan in #34783
  • Translate attention.md into Chinese by @wwwbai in #34716
  • LLaVA OV: fix unpadding precision by @zucchini-nlp in #34779
  • Fix low memory beam search by @zucchini-nlp in #34746
  • Fix the memory usage issue of logits in generate() by @kjohew in #34813
  • fix(DPT,Depth-Anything) torch.export by @philkuz in #34103
  • Fix: take into account meta device by @tibor-reiss in #34134
  • Fix hyperparameter search when optuna+deepseed by @corentin-ryr in #34642
  • Fix CI by tweaking torchao tests by @SunMarc in #34832
  • Fix CI slack reporting issue by @ydshieh in #34833
  • VLMs: enable generation tests - last batch by @zucchini-nlp in #34484
  • Change logging level from warning to info for max_steps overriding num_train_epochs by @qgallouedec in #34810
  • Fix ds nvme by @eljandoubi in #34444
  • Fix heuristic scheduling for UAG by @jmamou in #34805
  • Refactor StarCoder2 using modular by @Cyrilvallez in #34015
  • Watermarking: fix order by @zucchini-nlp in #34849
  • Update checks for torch.distributed.tensor to require torch >= 2.5 by @loadams in #34816
  • Remove quantization related config from dequantized model by @konradkalita in #34856
  • Auto compile when static cache by @ArthurZucker in #34247
  • Speculative decoding: Test the target distribution (to prevent issues like #32867) by @keyboardAnt in #34553
  • smol improvements to support more flexible usage by @andimarafioti in #34857
  • [CI] Skip EETQ tests while package is broken with latest transformers by @BenjaminBossan in #34854
  • Bitnet test fix to avoid using gated model by @MekkCyber in #34863
  • Fix support for image processors modifications in modular by @yonigozlan in #34866
  • Fix: Enable prefill phase key value caching of nemotron/minitron models by @jeongin601 in #34742
  • Add safe_globals to resume training on PyTorch 2.6 by @dvrogozh in #34632
  • Cache: init empty cache when use_cache by @zucchini-nlp in #34274
  • BLIP: fix generation after hub update by @zucchini-nlp in #34876
  • [Deberta/Deberta-v2] Refactor code base to support compile, export, and fix LLM by @ArthurZucker in #22105
  • πŸ”΄ Mllama: fix base prefix by @zucchini-nlp in #34874
  • Sum gathered input tokens by @techkang in #34554
  • allow unused input parameters passthrough when chunking in asr pipelines by @VictorAtIfInsurance in #33889
  • prepare_fa2_from_position_ids function bugfix by @meliksahturker in #33269
  • chore: fix some typos by @wanxiangchwng in #34891
  • Fix convert_tokens_to_string when decoder is None by @dszeto in #34569
  • [peft] Given that self.active_adapter is deprecated, avoid using it by @tomaarsen in #34804
  • Fix Qwen2 failing tests by @jla524 in #34819
  • Fix : BitNet tests by @MekkCyber in #34895
  • [AWQ, CI] Bump AWQ version used in docker image by @BenjaminBossan in #34922
  • fix static cache data type miss-match by @jiqing-feng in #34799
  • Fix test_auto_backbone_timm_model_from_pretrained by @ydshieh in #34877
  • Upgrade torch version to 2.5 in dockerfile for quantization CI by @MekkCyber in #34924
  • Fix failling GGML test by @MekkCyber in #34871
  • Updated documentation and added conversion utility by @ViktorooReps in #34319
  • making gpt2 fx traceable by @xuzifei-dmatrix in #34633
  • Fix import structure for Fast Image processors by @yonigozlan in #34859
  • VideoLLaVA: add default values by @zucchini-nlp in #34916
  • Skipping aqlm non working inference tests till fix merged by @MekkCyber in #34865
  • [Whisper] Fix whisper integration tests by @eustlb in #34111
  • Add Pytorch Tensor Parallel support for Mistral by @VladOS95-cyber in #34927
  • change apply_rotary_pos_emb of Glmmodel for GLM-Edge Series model by @zRzRzRzRzRzRzR in #34629
  • Fix torch.onnx.export of Qwen2-VL vision encoder by @xenova in #34852
  • Update the Python version in the Chinese README to match the English README. by @vansin in #34870
  • [i18n-ar] Translated file : docs/source/ar/benchmarks.md into Arabic by @AhmedAlmaghz in #33023
  • [docs] use device-agnostic API instead of cuda by @faaany in #34913
  • [doc] use full path for run_qa.py by @faaany in #34914
  • docs: HUGGINGFACE_HUB_CACHE -> HF_HUB_CACHE by @imba-tjd in #34904
  • [i18n-zh]Translated tiktoken.md into chinese by @blueingman in #34936
  • [FlexAttention] Update gemma2 by @ArthurZucker in #34942
  • Fix : Add PEFT from source to CI docker by @MekkCyber in #34969
  • Avoid calling get_max_length by @ydshieh in #34971
  • Fix flaky test execution caused by Thread by @ydshieh in #34966
  • 🌐 [i18n-KO] Translated encoder-decoder.md to Korean by @maximizemaxwell in #34880
  • [docs] add explanation to release_memory() by @faaany in #34911
  • [i18n-zh]Translated perf_train_special.md into Chinese by @blueingman in #34948
  • Fix typo in code block in vipllava.md by @yuanx749 in #34957
  • Fixed typo in VisitWebpageTool by @sergiopaniego in #34978
  • [PEFT] Set eval mode when loading PEFT adapter by @BenjaminBossan in #34509
  • Fix save_pretrained for partially offloaded models by @kylesayrs in #34890
  • 🚨🚨🚨 Changed DINOv2Config default patch size to 14 by @OFSkean in #34568
  • Refine the code of Universal Assisted Generation by @xinpengzz in #34823
  • Allow compressed-tensors quantized model to be trained by @horheynm in #34520
  • Offloaded cache: fix generate by @zucchini-nlp in #34921
  • Fix utils/check_bad_commit.py (for auto ping in CI) by @ydshieh in #34943
  • Add optimized PixtralImageProcessorFast by @mgoin in #34836
  • Improve .from_pretrained type annotations by @qubvel in #34973
  • Fix docker CI : install autogptq from source by @MekkCyber in #35000
  • Let server decide default repo visibility by @Wauplin in #34999
  • 🚨🚨🚨 Uniformize kwargs for TrOCR Processor by @tibor-reiss in #34587
  • Update timm version by @qubvel in #35005
  • fix: double verbs by @SamuelLarkin in #35008
  • Update FillMaskPipeline.__call__ signature and docstring by @alvarobartt in #35006
  • Only cast cu_seqlens when tracing by @xenova in #35016
  • fix variable undefined bug when return_tensors is not specified in llava processing by @chenweize1998 in #34953
  • Optimize memory usage of mllama encoder by @milesial in #34930
  • Typo in warning switching to optimum-quanto by @Bojun-Feng in #35028
  • Add type hints for forward functions in Gemma2 by @jla524 in #35034
  • Fix test_eager_matches_sdpa_inference for XPU backend by @dvrogozh in #34889
  • Multiple typo fixes in Tutorials docs by @henryhmko in #35035
  • add docstring example for compute_loss_func by @secrettoad in #35020
  • [i18n-ar] Translated file : docs/source/ar/notebooks.md into Arabic by @AhmedAlmaghz in #33049
  • [docs] add the missing import for Image and bug fix by @faaany in #34776
  • Translate bertlogy.md into Chinese by @wwwbai in #34908
  • Automatic compilation in generate: do not rely on inner function by @Cyrilvallez in #34923
  • Add token cost + runtime monitoring to Agent and HfEngine children by @aymeric-roucher in #34548
  • Fix BertGeneration by @ydshieh in #35043
  • fix speecht5 failure issue in test_peft_gradient_checkpointing_enable… by @sywangyi in #34454
  • [docs] fix example code bug by @faaany in #35054
  • Translate community.md into Chinese by @wwwbai in #35013
  • [docs] use device-agnostic instead of cuda by @faaany in #35047
  • [docs] use device-agnostic API instead of hard-coded cuda by @faaany in #35048
  • Fix pad_token_tensor is None in warning by @tshu-w in #34005
  • Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 by @VladOS95-cyber in #35007
  • [GPTNeoX] Flex Attention + Refactor by @vasqu in #34896
  • Support for easier multimodal use of modular by @Cyrilvallez in #35056
  • [docs] add a comment that offloading requires CUDA GPU by @faaany in #35055
  • [docs] Increase visibility of torch_dtype="auto" by @stevhliu in #35067
  • Informative by @ydshieh in #35059
  • [Whisper] Fix whisper tokenizer by @eustlb in #34537
  • [tokenizers] bump to 0.21 by @ArthurZucker in #34972
  • Update Mistral conversion script by @Cyrilvallez in #34829
  • Fix tie_word_embeddings handling for GGUF models by @Isotr0py in #35085
  • Deprecate quanto and switch to optimum-quanto by @MekkCyber in #35001
  • BLIP: this is correct now by @zucchini-nlp in #35081
  • [trainer] fix the GA model_accepts_loss_kwargs by @ArthurZucker in #34915
  • Fix flaky Hub CI (test_trainer.py) by @ydshieh in #35062
  • Adaptive dynamic number of speculative tokens by @jmamou in #34156

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @AhmedAlmaghz
    • [i18n-ar] Translated file : docs/source/ar/fast_tokenizers.md into Arabic (#33034)
    • [i18n-ar] Translated file : docs/source/ar/multilingual.md into Arabic (#33048)
    • [i18n-ar] Translated file : docs/source/ar/trainer.md into Arabic (#33080)
    • [i18n-ar] Translated file : docs/source/ar/torchscript.md into Arabic (#33079)
    • [i18n-ar] Translated file : docs/source/ar/benchmarks.md into Arabic (#33023)
  • @maximizemaxwell
    • 🌐 [i18n-KO] Translated perf_train_special.md to Korean (#34590)
    • 🌐 [i18n-KO] Translated bert.md to Korean (#34627)
    • 🌐 [i18n-KO] Translated marian.md to Korean (#34698)
    • 🌐 [i18n-KO] Translated encoder-decoder.md to Korean (#34880)
  • @2015aroras
    • Add OLMo November 2024 (#34551)
    • Rename OLMo November to OLMo2 (#34864)
  • @mgoin
    • Add optimized PixtralImageProcessorFast (#34836)

Don't miss a new transformers release

NewReleases is sending notifications on new releases.