torch 2.2.1 on Python PyPI

This release is meant to fix the following issues (regressions / silent correctness):

Fix missing OpenMP support on Apple Silicon binaries (pytorch/builder#1697)
Fix crash when mixing lazy and non-lazy tensors in one operation (#117653)
Fix PyTorch performance regression on Linux aarch64 (pytorch/builder#1696)
Fix silent correctness in DTensor _to_copy operation (#116426)
Fix properly assigning param.grad_fn for next forward (#116792)
Ensure gradient clear out pending AsyncCollectiveTensor in FSDP Extension (#116122)
Fix processing unflatten tensor on compute stream in FSDP Extension (#116559)
Fix FSDP AssertionError on tensor subclass when setting sync_module_states=True (#117336)
Fix DCP state_dict cannot correctly find FQN when the leaf module is wrapped by FSDP (#115592)
Fix OOM when when returning a AsyncCollectiveTensor by forcing _gather_state_dict() to be synchronous with respect to the mian stream. (#118197) (#119716)
Fix Windows runtime torch.distributed.DistNetworkError: [WinError 32] The process cannot access the file because it is being used by another process (#118860)
Update supported python versions in package description (#119743)
Fix SIGILL crash during import torch on CPUs that do not support SSE4.1 (#116623)
Fix DCP RuntimeError in get_state_dict and set_state_dict (#119573)
Fixes for HSDP + TP integration with device_mesh (#112435) (#118620) (#119064) (#118638) (#119481)
Fix numerical error with mixedmm on NVIDIA V100 (#118591)
Fix RuntimeError when using SymInt input invariant when splitting graphs (#117406)
Fix compile DTensor.from_local in trace_rule_look up (#119659)
Improve torch.compile integration with CUDA-11.8 binaries (#119750)

Release tracker #119295 contains all relevant pull requests related to this release as well as links to related issues.

torch 2.2.1 PyTorch 2.2.1 Release, bug fix release on Python PyPI

torch 2.2.1
PyTorch 2.2.1 Release, bug fix release

on Python PyPI