torch 2.3.1 on Python PyPI

This release is meant to fix the following issues (regressions / silent correctness):

Fix format_utils executable, which was causing it to run as a no-op (#123407)
Fix regression with device_mesh in 2.3.0 during initialization causing memory spikes (#124780)
Fix crash of FSDP + DTensor with ShardingStrategy.SHARD_GRAD_OP (#123617)
Fix failure with distributed checkpointing + FSDP if at least 1 forward/backward pass has not been run. (#121544) (#127069)
Fix error with distributed checkpointing + FSDP, and with use_orig_params = False and activation checkpointing (#124698) (#126935)
Fix set_model_state_dict errors on compiled module with non-persistent buffer with distributed checkpointing (#125336) (#125337)

Fix UTF-8 encoding on Windows .pyi files (#124932)
Fix import torch failure when wheel is installed for a single user on Windows(#125684)
Fix compatibility with torchdata 0.7.1 (#122616)
Fix aarch64 docker publishing to https://ghcr.io (#125617)
Fix performance regression an aarch64 linux (pytorch/builder#1803)

Fix DeepSpeed transformer extension build on ROCm (#121030)
Fix kernel crash on tensor.dtype.to_complex() after ~100 calls in ipython kernel (#125154)

Release tracker #125425 contains all relevant pull requests related to this release as well as links to related issues.

torch 2.3.1 PyTorch 2.3.1 Release, bug fix release on Python PyPI