Patch release 4.55.3
Focused on stabilizing FlashAttention-2 on Ascend NPU, improving FSDP behavior for generic-task models, fixing MXFP4 integration for GPT-OSS
Bug Fixes & Improvements
- FlashAttention-2 / Ascend NPU – Fix “unavailable” runtime error (#40151) by @FightingZhen
- FlashAttention kwargs – Revert FA kwargs preparation to resolve regression (#40161) by @Cyrilvallez
- FSDP (generic-task models) – Fix sharding/runtime issues (#40191) by @Cyrilvallez
- GPT-OSS / MXFP4 – Ensure swiglu_limit is correctly passed through (#40197) by @returnL
- Mamba – Fix cache handling to prevent stale/incorrect state (#40203) by @manueldeprada
- Misc – Minor follow-up fix addressing #40262 by @ArthurZucker