huggingface/transformers v4.44.1 on GitHub

Here are the different fixes, mostly Gemma2 context length, nits here and there, and generation issues

is_torchdynamo_compiling -- cast a wide exception net (#32476) by @gante
Revert "fixes to properly shard FSDP across cpu and meta for cpu_effcient_loading for prequantized 4bit (#32276)" (#32477) by @gante and @matthewdouglas
Gemma2: fix FA2 generation (#32553) by @zucchini-nlp
Fix: FA2 with packed training (#32487) by @zucchini-nlp
Fix sliding window attention used in Gemma2FlashAttention2 (#32522) by @brcps12
Automatically add transformers tag to the modelcard (#32623) by @LysandreJik
add back the position ids (#32554) by @ArthurZucker
Use head_dim if in config for RoPE (#32495) @suiyoubi @ArthurZucker
Revert PR 32299, flag users when Zero-3 was missed (#32851) by @muellerzr
fix multi-gpu with static cache (#32543) by @SunMarc
Reduce the error log when using core models that need their weights r… (#32656) by @muellerzr
Fix VLM generation issues (#32836) by @zucchini-nlp
Fix generate with inputs_embeds as input (#32493) (this PR has some cherry-pick)

Full Changelog: v4.44.0...v4.44.1