Mostly gemma2 support FA2 softcapping!
but also fix the sliding window for long context and other typos.
- [Gemma2] Support FA2 softcapping (#31887) by @ArthurZucker
- [ConvertSlow] make sure the order is preserved for addedtokens (#31902) by @ArthurZucker
- Fixes to alternating SWA layers in Gemma2 (#31775) by @turboderp
- Requires for torch.tensor before casting (#31755) by @echarlaix
Was off last week could not get this out, thanks all for your patience 🥳