Fix backward compatibility issues with Llama and Gemma:
We mostly made sure that performances are not affected by the new change of paradigm with ROPE. Fixed the ROPE computation (should always be in float32) and the causal_mask
dtype was set to bool to take less RAM.
YOLOS had a regression, and Llama / T5Tokenizer had a warning popping for random reasons
- FIX [Gemma] Fix bad rebase with transformers main (#29170)
- Improve _update_causal_mask performance (#29210)
- [T5 and Llama Tokenizer] remove warning (#29346)
- [Llama ROPE] Fix torch export but also slow downs in forward (#29198)
- RoPE loses precision for Llama / Gemma + Gemma logits.float() (#29285)
- Patch YOLOS and others (#29353)
- Use torch.bool instead of torch.int64 for non-persistant causal mask buffer (#29241)