This release is meant to fix the following issues (regressions / silent correctness):
Torch.compile:
- Remove runtime dependency on JAX/XLA, when importing
torch.__dynamo
(#124634) - Hide
Plan failed with a cudnnException
warning (#125790) - Fix CUDA memory leak (#124238) (#120756)
Distributed:
- Fix
format_utils executable
, which was causing it to run as a no-op (#123407) - Fix regression with
device_mesh
in 2.3.0 during initialization causing memory spikes (#124780) - Fix crash of
FSDP + DTensor
withShardingStrategy.SHARD_GRAD_OP
(#123617) - Fix failure with distributed checkpointing + FSDP if at least 1 forward/backward pass has not been run. (#121544) (#127069)
- Fix error with distributed checkpointing + FSDP, and with
use_orig_params = False
and activation checkpointing (#124698) (#126935) - Fix
set_model_state_dict
errors on compiled module with non-persistent buffer with distributed checkpointing (#125336) (#125337)
MPS:
- Fix data corruption when coping large (>4GiB) tensors (#124635)
- Fix
Tensor.abs()
for complex (#125662)
Packaging:
- Fix UTF-8 encoding on Windows
.pyi
files (#124932) - Fix
import torch
failure when wheel is installed for a single user on Windows(#125684) - Fix compatibility with torchdata 0.7.1 (#122616)
- Fix aarch64 docker publishing to https://ghcr.io (#125617)
- Fix performance regression an aarch64 linux (pytorch/builder#1803)
Other:
- Fix DeepSpeed transformer extension build on ROCm (#121030)
- Fix kernel crash on
tensor.dtype.to_complex()
after ~100 calls in ipython kernel (#125154)
Release tracker #125425 contains all relevant pull requests related to this release as well as links to related issues.