This release is meant to fix the following issues (regressions / silent correctness):
- Remove spurious warning in comparison ops (#112170)
- Fix segfault in foreach_* operations when input list length does not match (#112349)
- Fix cuda driver API to load the appropriate .so file (#112996)
- Fix missing CUDA initialization when calling FFT operations (#110326)
- Ignore beartype==0.16.0 within the onnx package as it is incompatible (#111861)
- Fix the behavior of torch.new_zeros in onnx due to TorchScript behavior change (#111694)
- Remove unnecessary slow code in
torch.distributed.checkpoint.optimizer.load_sharded_optimizer_state_dict
(#111687) - Add
planner
argument totorch.distributed.checkpoint.optimizer.load_sharded_optimizer_state_dict
(#111393) - Continue if param not exist in sharded load in
torch.distributed.FSDP
(#109116) - Fix handling of non-contiguous bias_mask in
torch.nn.functional.scaled_dot_product_attention
(#112673) - Fix the meta device implementation for
nn.functional.scaled_dot_product_attention
(#110893) - Fix copy from mps to cpu device when storage_offset is non-zero (#109557)
- Fix segfault in
torch.sparse.mm
for non-contiguous inputs (#111742) - Fix circular import between Dynamo and einops (#110575)
- Verify flatbuffer module fields are initialized for mobile deserialization (#109794)
The #110961 contains all relevant pull requests related to this release as well as links to related issues.