Patch release to resolve some critical issues relating to the recent cache refactor, flash attention refactor and training in the multi-gpu and multi-node settings:
- Resolve training bug with PEFT + GC #28031
- Resolve cache issue when going beyond context window for Mistral/Mixtral FA2 #28037
- Re-enable passing
config
tofrom_pretrained
with FA #28043 - Fix resuming from checkpoint when using FDSP with FULL_STATE_DICT #27891
- Resolve bug when saving a checkpoint in the multi-node setting #28078