New models
Phi3
The Phi-3 model was proposed in Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone by Microsoft.
TLDR; Phi-3 introduces new ROPE scaling methods, which seems to scale fairly well! A 3b and a
Phi-3-mini is available in two context-length variantsโ4K and 128K tokens. It is the first model in its class to support a context window of up to 128K tokens, with little impact on quality.
JetMoE
JetMoe-8B is an 8B Mixture-of-Experts (MoE) language model developed by Yikang Shen and MyShell. JetMoe project aims to provide a LLaMA2-level performance and efficient language model with a limited budget. To achieve this goal, JetMoe uses a sparsely activated architecture inspired by the ModuleFormer. Each JetMoe block consists of two MoE layers: Mixture of Attention Heads and Mixture of MLP Experts. Given the input tokens, it activates a subset of its experts to process them. This sparse activation schema enables JetMoe to achieve much better training throughput than similar size dense models. The training throughput of JetMoe-8B is around 100B tokens per day on a cluster of 96 H100 GPUs with a straightforward 3-way pipeline parallelism strategy.
- Add JetMoE model by @yikangshen in #30005
PaliGemma
PaliGemma is a lightweight open vision-language model (VLM) inspired by PaLI-3, and based on open components like the SigLIP vision model and the Gemma language model. PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images.
More than 120 checkpoints are released see the collection here !
VideoLlava
Video-LLaVA exhibits remarkable interactive capabilities between images and videos, despite the absence of image-video pairs in the dataset.
๐ก Simple baseline, learning united visual representation by alignment before projection
With the binding of unified visual representations to the language feature space, we enable an LLM to perform visual reasoning capabilities on both images and videos simultaneously.
๐ฅ High performance, complementary learning with video and image
Extensive experiments demonstrate the complementarity of modalities, showcasing significant superiority when compared to models specifically designed for either images or videos.
- Add Video Llava by @zucchini-nlp in #29733
Falcon 2 and FalconVLM:
Two new models from TII-UAE! They published a blog-post with more details! Falcon2 introduces parallel mlp, and falcon VLM uses the Llava
framework
- Support for Falcon2-11B by @Nilabhra in #30771
- Support arbitrary processor by @ArthurZucker in #30875
GGUF from_pretrained
support
You can now load most of the GGUF quants directly with transformers' from_pretrained
to convert it to a classic pytorch model. The API is simple:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
We plan more closer integrations with llama.cpp / GGML ecosystem in the future, see: #27712 for more details
- Loading GGUF files support by @LysandreJik in #30391
Quantization
New quant methods
In this release we support new quantization methods: HQQ & EETQ contributed by the community. Read more about how to quantize any transformers model using HQQ & EETQ in the dedicated documentation section
- Add HQQ quantization support by @mobicham in #29637
- [FEAT]: EETQ quantizer support by @dtlzhuangz in #30262
dequantize
API for bitsandbytes models
In case you want to dequantize models that have been loaded with bitsandbytes, this is now possible through the dequantize
API (e.g. to merge adapter weights)
- FEAT / Bitsandbytes: Add
dequantize
API for bitsandbytes quantized models by @younesbelkada in #30806
API-wise, you can achieve that with the following:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer
model_id = "facebook/opt-125m"
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=BitsAndBytesConfig(load_in_4bit=True))
tokenizer = AutoTokenizer.from_pretrained(model_id)
model.dequantize()
text = tokenizer("Hello my name is", return_tensors="pt").to(0)
out = model.generate(**text)
print(tokenizer.decode(out[0]))
Generation updates
- Add Watermarking LogitsProcessor and WatermarkDetector by @zucchini-nlp in #29676
- Cache: Static cache as a standalone object by @gante in #30476
- Make
Gemma
work withtorch.compile
by @ydshieh in #30775
Sdpa support
๐จ might be breaking
- ๐จ๐จ๐จDeprecate
evaluation_strategy
toeval_strategy
๐จ๐จ๐จ by @muellerzr in #30190 - ๐จ Add training compatibility for Musicgen-like models by @ylacombe in #29802
- ๐จ Update image_processing_vitmatte.py by @rb-synth in #30566
Cleanups
- Remove task guides auto-update in favor of links towards task pages by @LysandreJik in #30429
- Remove add-new-model in favor of add-new-model-like by @LysandreJik in #30424
- Remove mentions of models in the READMEs and link to the documentation page in which they are featured. by @LysandreJik in #30420
Not breaking but important for Llama tokenizers
- [
LlamaTokenizerFast
] Refactor default llama by @ArthurZucker in #28881
Fixes
-
Fix: remove
pad token id
in pipeline forward arguments by @zucchini-nlp in #30285 -
disable use_cache if using gradient checkpointing by @chenzizhao in #30320
-
Fix test transposing image with EXIF Orientation tag by @albertvillanova in #30319
-
Fix
AssertionError
in clip conversion script by @ydshieh in #30321 -
[UDOP] Add special tokens to tokenizer by @NielsRogge in #29594
-
feat: Upgrade Weights & Biases callback by @parambharat in #30135
-
[Feature Extractors] Fix kwargs to pre-trained by @sanchit-gandhi in #30260
-
Pipeline: fix
pad_token_id
again by @zucchini-nlp in #30338 -
[Whisper] Fix slow tests by @sanchit-gandhi in #30152
-
Transformers Metadata by @LysandreJik in #30344
-
Deprecate default chat templates by @Rocketknight1 in #30346
-
Do not remove half seq length in generation tests by @zucchini-nlp in #30016
-
Fix config + attn_implementation in AutoModelForCausalLM.from_pretrained by @hiyouga in #30299
-
[Grounding DINO] Add resources by @NielsRogge in #30232
-
Nits for model docs by @merveenoyan in #29795
-
GenerationConfig: warn if pad token is negative by @zucchini-nlp in #30187
-
Add FSDP config for CPU RAM efficient loading through accelerate by @helloworld1 in #30002
-
Llama
family, fixuse_cache=False
generation by @ArthurZucker in #30380 -
Update docstrings for text generation pipeline by @Rocketknight1 in #30343
-
Terminator strings for generate() by @Rocketknight1 in #28932
-
Fix layerwise GaLore optimizer hard to converge with warmup scheduler by @hiyouga in #30372
-
FIX / PEFT: Pass device correctly to peft by @younesbelkada in #30397
-
Add sdpa and fa2 the Wav2vec2 family. by @kamilakesbi in #30121
-
show
-rs
to show skip reasons by @ArthurZucker in #30318 -
Add inputs embeds in generation by @zucchini-nlp in #30269
-
[Grounding DINO] Add support for cross-attention in GroundingDinoMultiHeadAttention by @EduardoPach in #30364
-
remove redundant logging from longformer by @riklopfer in #30365
-
fix: link to HF repo/tree/revision when a file is missing by @mapmeld in #30406
-
[tests] add
require_torch_sdpa
for test that needs sdpa support by @faaany in #30408 -
Fix on "cache position" for assisted generation by @zucchini-nlp in #30068
-
fix for itemsize => element_size() for torch backwards compat by @winglian in #30133
-
Make EosTokenCriteria compatible with mps by @pcuenca in #30376
-
FIX: re-add bnb on docker image by @younesbelkada in #30427
-
Remove old TF port docs by @Rocketknight1 in #30426
-
Rename torch.run to torchrun by @steven-basart in #30405
-
Fix use_cache for xla fsdp by @alanwaketan in #30353
-
[
LlamaTokenizerFast
] Refactor default llama by @ArthurZucker in #28881 -
New model PR needs green (slow tests) CI by @ydshieh in #30341
-
Add llama3 by @ArthurZucker in #30334
-
[
Llava
] + CIs fix red cis and llava integration tests by @ArthurZucker in #30440 -
fix uncaught init of linear layer in clip's/siglip's for image classification models by @vasqu in #30435
-
[SegGPT] Fix loss calculation by @EduardoPach in #30421
-
Add
paths
filter to avoid the chance of being triggered by @ydshieh in #30453 -
Fix wrong indent in
utils/check_if_new_model_added.py
by @ydshieh in #30456 -
[
research_project
] Most of the security issues come from this requirement.txt by @ArthurZucker in #29977 -
Neuron: When save_safetensor=False, no need to move model to CPU by @jeffhataws in #29703
-
Enable fp16 on CPU by @muellerzr in #30459
-
Non blocking support to torch DL's by @muellerzr in #30465
-
consistent job / pytest report / artifact name correspondence by @ydshieh in #30392
-
Workflow / ENH: Add SSH into our runners workflow by @younesbelkada in #30425
-
FIX / Workflow: Change tailscale trigger condition by @younesbelkada in #30471
-
FIX / Workflow: Fix SSH workflow bug by @younesbelkada in #30474
-
[fix codellama conversion] by @ArthurZucker in #30472
-
Script for finding candidate models for deprecation by @amyeroberts in #29686
-
Fix SigLip classification doctest by @amyeroberts in #30475
-
Don't run fp16 MusicGen tests on CPU by @amyeroberts in #30466
-
Prevent crash with
WandbCallback
with third parties by @tomaarsen in #30477 -
Add WSD scheduler by @visheratin in #30231
-
Fix Issue #29817 Video Classification Task Guide Using Undeclared Variables by @manju-rangam in #30457
-
Make accelerate install non-torch dependent by @muellerzr in #30463
-
Introduce Stateful Callbacks by @muellerzr in #29666
-
Fix Llava for 0-embeddings by @zucchini-nlp in #30473
-
Do not use deprecated
SourceFileLoader.load_module()
in dynamic module loading by @XuehaiPan in #30370 -
Add sidebar tutorial for chat models by @Rocketknight1 in #30401
-
Quantization:
HfQuantizer
quant method update by @younesbelkada in #30484 -
[docs] Spanish translation of pipeline_tutorial.md by @aaronjimv in #30252
-
FEAT: PEFT support for EETQ by @younesbelkada in #30449
-
Fix the
bitsandbytes
error formatting ("Some modules are dispatched on ...") by @kyo-takano in #30494 -
Update
dtype_byte_size
to handle torch.float8_e4m3fn/float8_e5m2 types by @mgoin in #30488 -
Use the Keras set_random_seed in tests by @Rocketknight1 in #30504
-
Remove skipping logic now that set_epoch exists by @muellerzr in #30501
-
[
DETR
] Remove timm hardcoded logic in modeling files by @amyeroberts in #29038 -
[examples] update whisper fine-tuning by @sanchit-gandhi in #29938
-
Fix GroundingDINO, DPR after BERT SDPA update by @amyeroberts in #30506
-
load_image - decode b64encode and encodebytes strings by @amyeroberts in #30192
-
[SegGPT] Fix seggpt image processor by @EduardoPach in #29550
-
Fix link in dbrx.md by @eitanturok in #30509
-
Allow boolean FSDP options in fsdp_config by @helloworld1 in #30439
-
Pass attn_implementation when using AutoXXX.from_config by @amyeroberts in #30507
-
Fix broken link to Transformers notebooks by @clinty in #30512
-
Fix repo. fetch/checkout in PR slow CI job by @ydshieh in #30537
-
Reenable SDPA's FA2 During Training with torch.compile by @warner-benjamin in #30442
-
Include safetensors as part of
_load_best_model
by @muellerzr in #30553 -
Pass
use_cache
in kwargs for GPTNeoX by @zucchini-nlp in #30538 -
Generate: update links on LLM tutorial doc by @gante in #30550
-
BlipModel: get_multimodal_features method by @XavierSpycy in #30438
-
Add chat templating support for KeyDataset in text-generation pipeline by @DarshanDeshpande in #30558
-
Fix generation doctests by @zucchini-nlp in #30263
-
Use text config's vocab size in testing models by @zucchini-nlp in #30568
-
Encoder-decoder models: move embedding scale to nn.Module by @zucchini-nlp in #30410
-
Fix Marian model conversion by @zucchini-nlp in #30173
-
Refactor default chat template warnings by @Rocketknight1 in #30551
-
Fix QA example by @Rocketknight1 in #30580
-
remove jax example by @ArthurZucker in #30498
-
Fix canonical model --model_type in examples by @amyeroberts in #30480
-
Bump gitpython from 3.1.32 to 3.1.41 in /examples/research_projects/decision_transformer by @dependabot in #30587
-
Fix image segmentation example - don't reopen image by @amyeroberts in #30481
-
Improve object detection task guideline by @NielsRogge in #29967
-
Generate: remove deprecated public decoding functions and streamline logic ๐งผ by @gante in #29956
-
Fix llava half precision and autocast issues by @frasermince in #29721
-
Fix: failing CI after #30568 by @zucchini-nlp in #30599
-
Fix for Neuron by @michaelbenayoun in #30259
-
Fix memory leak with CTC training script on Chinese languages by @lucky-bai in #30358
-
Fix copies for DBRX - neuron fix by @amyeroberts in #30610
-
fix:missing
output_router_logits
in SwitchTransformers by @lausannel in #30573 -
Use
contiguous()
in clip checkpoint conversion script by @ydshieh in #30613 -
phi3 chat_template does not support system role by @amitportnoy in #30606
-
Docs: fix
generate
-related rendering issues by @gante in #30600 -
Docs: add missing
StoppingCriteria
autodocs by @gante in #30617 -
Fix FX tracing issues for Llama by @michaelbenayoun in #30619
-
Output
None
as attention when layer is skipped by @jonghwanhyeon in #30597 -
Fix CI after #30410 by @zucchini-nlp in #30612
-
add mlp bias for llama models by @mayank31398 in #30031
-
HQQ: PEFT support for HQQ by @younesbelkada in #30632
-
Prevent
TextGenerationPipeline._sanitize_parameters
from overriding previously provided parameters by @yting27 in #30362 -
Avoid duplication in PR slow CI model list by @ydshieh in #30634
-
[
CI update
] Try to use dockers and no cache by @ArthurZucker in #29202 -
Check if the current compiled version of pytorch supports MPS by @jiaqianjing in #30664
-
Hotfix-change-ci by @ArthurZucker in #30669
-
Quantization / HQQ: Fix HQQ tests on our runner by @younesbelkada in #30668
-
Fix llava next tie_word_embeddings config by @SunMarc in #30640
-
Trainer._load_from_checkpoint - support loading multiple Peft adapters by @claralp in #30505
-
Trainer - add cache clearing and the option for batched eval metrics computation by @FoamoftheSea in #28769
-
Fix typo: llama3.md by @mimbres in #30653
-
Respect
resume_download
deprecation by @Wauplin in #30620 -
top-k instead of top-p in MixtralConfig docstring by @sorgfresser in #30687
-
Bump jinja2 from 3.1.3 to 3.1.4 in /examples/research_projects/decision_transformer by @dependabot in #30680
-
Bump werkzeug from 3.0.1 to 3.0.3 in /examples/research_projects/decision_transformer by @dependabot in #30679
-
Adding _tie_weights() to prediction heads to support low_cpu_mem_usage=True by @hackyon in #29024
-
Fix
cache_position
initialisation for generation withuse_cache=False
by @nurlanov-zh in #30485 -
Word-level timestamps broken for short-form audio by @kamilakesbi in #30325
-
Updated docs of
forward
inIdefics2ForConditionalGeneration
with correctignore_index
value by @zafstojano in #30678 -
Bump tqdm from 4.63.0 to 4.66.3 in /examples/research_projects/decision_transformer by @dependabot in #30646
-
Bump tqdm from 4.48.2 to 4.66.3 in /examples/research_projects/visual_bert by @dependabot in #30645
-
Reboot Agents by @aymeric-roucher in #30387
-
Bump tqdm from 4.48.2 to 4.66.3 in /examples/research_projects/lxmert by @dependabot in #30644
-
Separate tokenizer tests by @ArthurZucker in #30675
-
Update
workflow_id
inutils/get_previous_daily_ci.py
by @ydshieh in #30695 -
Rename artifact name
prev_ci_results
toci_results
by @ydshieh in #30697 -
Add safetensors to model not found error msg for default use_safetensors value by @davidgxue in #30602
-
Pin deepspeed by @muellerzr in #30701
-
Patch CLIP image preprocessor by @rootonchair in #30698
-
Add examples for detection models finetuning by @qubvel in #30422
-
[BitsandBytes] Verify if GPU is available by @NielsRogge in #30533
-
Llava: remove dummy labels by @zucchini-nlp in #30706
-
Add installation of examples requirements in CI by @qubvel in #30708
-
Add dynamic resolution input/interpolate position embedding to SigLIP by @davidgxue in #30719
-
Removal of deprecated maps by @LysandreJik in #30576
-
KV cache is no longer a model attribute by @zucchini-nlp in #30730
-
Generate: consistently handle special tokens as tensors by @gante in #30624
-
Update CodeLlama references by @osanseviero in #30218
-
[docs] Update es/pipeline_tutorial.md by @aaronjimv in #30684
-
Update llama3.md, fix typo by @mimbres in #30739
-
mlp_only_layers is more flexible than decoder_sparse_step by @eigen2017 in #30552
-
PEFT / Trainer: Make use of
model.active_adapters()
instead of deprecatedmodel.active_adapter
whenever possible by @younesbelkada in #30738 -
[docs] Update link in es/pipeline_webserver.md by @aaronjimv in #30745
-
hqq - fix weight check in check_quantized_param by @mobicham in #30748
-
Workflow: Replace
actions/post-slack
with centrally defined workflow by @younesbelkada in #30737 -
Blip dynamic input resolution by @zafstojano in #30722
-
[GroundingDino] Adding ms_deform_attn kernels by @EduardoPach in #30768
-
Llama: fix custom 4D masks, v2 by @poedator in #30348
-
Generation / FIX: Fix multi-device generation by @younesbelkada in #30746
-
enable Pipeline to get device from model by @faaany in #30534
-
[Object detection pipeline] Lower threshold by @NielsRogge in #30710
-
Generate: remove near-duplicate sample/greedy copy by @gante in #30773
-
Port IDEFICS to tensorflow by @a8nova in #26870
-
Generate: assistant should be greedy in assisted decoding by @gante in #30778
-
Save other CI jobs' result (torch/tf pipeline, example, deepspeed etc) by @ydshieh in #30699
-
Deprecate models script by @amyeroberts in #30184
-
CI: update to ROCm 6.0.2 and test MI300 by @fxmarty in #30266
-
Fix cache type in Idefics2 by @zucchini-nlp in #30729
-
PEFT: Access active_adapters as a property in Trainer by @pashminacameron in #30790
-
Deprecate TF weight conversion since we have full Safetensors support now by @Rocketknight1 in #30786
-
[T5] Adding
model_parallel = False
toT5ForTokenClassification
andMT5ForTokenClassification
by @retarfi in #30763 -
Added the necessay import of module by @ankur0904 in #30804
-
Add support for custom checkpoints in MusicGen by @jla524 in #30011
-
Add missing dependencies in image classification example by @jla524 in #30820
-
Support mixed-language batches in
WhisperGenerationMixin
by @cifkao in #29688 -
Remove unused module DETR based models by @conditionedstimulus in #30823
-
Jamba - Skip 4d custom attention mask test by @amyeroberts in #30826
-
Missing
Optional
in typing. by @xkszltl in #30821 -
Update ds_config_zero3.json by @pacman100 in #30829
-
Better llava next. by @nxphi47 in #29850
-
Deprecate models script - correctly set the model name for the doc file by @amyeroberts in #30785
-
Fix llama model sdpa attention forward function masking bug when output_attentions=True by @Aladoro in #30652
-
[LLaVa-NeXT] Small fixes by @NielsRogge in #30841
-
[Idefics2] Improve docs, add resources by @NielsRogge in #30717
-
Cache: add new flag to distinguish models that
Cache
but not static cache by @gante in #30800 -
Disable the FA backend for SDPA on AMD GPUs by @mht-sharma in #30850
-
Video-LLaVa: Fix docs by @zucchini-nlp in #30855
-
Docs: update example with assisted generation + sample by @gante in #30853
-
TST / Quantization: Reverting to torch==2.2.1 by @younesbelkada in #30866
-
Fix VideoLlava imports by @amyeroberts in #30867
-
TEST: Add llama logits tests by @younesbelkada in #30835
-
Remove deprecated logic and warnings by @amyeroberts in #30743
-
Enable device map by @darshana1406 in #30870
-
Fix dependencies for image classification example by @jla524 in #30842
-
[whisper] fix multilingual fine-tuning by @sanchit-gandhi in #30865
-
update release script by @ArthurZucker in #30880
New Contributors
- @joaocmd made their first contribution in #23342
- @kamilakesbi made their first contribution in #30121
- @dtlzhuangz made their first contribution in #30262
- @steven-basart made their first contribution in #30405
- @manju-rangam made their first contribution in #30457
- @kyo-takano made their first contribution in #30494
- @mgoin made their first contribution in #30488
- @eitanturok made their first contribution in #30509
- @clinty made their first contribution in #30512
- @warner-benjamin made their first contribution in #30442
- @XavierSpycy made their first contribution in #30438
- @DarshanDeshpande made their first contribution in #30558
- @frasermince made their first contribution in #29721
- @lucky-bai made their first contribution in #30358
- @rb-synth made their first contribution in #30566
- @lausannel made their first contribution in #30573
- @jonghwanhyeon made their first contribution in #30597
- @mobicham made their first contribution in #29637
- @yting27 made their first contribution in #30362
- @jiaqianjing made their first contribution in #30664
- @claralp made their first contribution in #30505
- @mimbres made their first contribution in #30653
- @sorgfresser made their first contribution in #30687
- @nurlanov-zh made their first contribution in #30485
- @zafstojano made their first contribution in #30678
- @davidgxue made their first contribution in #30602
- @rootonchair made their first contribution in #30698
- @eigen2017 made their first contribution in #30552
- @Nilabhra made their first contribution in #30771
- @a8nova made their first contribution in #26870
- @pashminacameron made their first contribution in #30790
- @retarfi made their first contribution in #30763
- @yikangshen made their first contribution in #30005
- @ankur0904 made their first contribution in #30804
- @conditionedstimulus made their first contribution in #30823
- @nxphi47 made their first contribution in #29850
- @Aladoro made their first contribution in #30652
- @hyenal made their first contribution in #30555
- @darshana1406 made their first contribution in #30870
Full Changelog: v4.40.2...v4.41.0