torch 2.1.1 on Python PyPI

This release is meant to fix the following issues (regressions / silent correctness):

Remove spurious warning in comparison ops (#112170)
Fix segfault in foreach_* operations when input list length does not match (#112349)
Fix cuda driver API to load the appropriate .so file (#112996)
Fix missing CUDA initialization when calling FFT operations (#110326)
Ignore beartype==0.16.0 within the onnx package as it is incompatible (#111861)
Fix the behavior of torch.new_zeros in onnx due to TorchScript behavior change (#111694)
Remove unnecessary slow code in torch.distributed.checkpoint.optimizer.load_sharded_optimizer_state_dict (#111687)
Add planner argument to torch.distributed.checkpoint.optimizer.load_sharded_optimizer_state_dict (#111393)
Continue if param not exist in sharded load in torch.distributed.FSDP (#109116)
Fix handling of non-contiguous bias_mask in torch.nn.functional.scaled_dot_product_attention (#112673)
Fix the meta device implementation for nn.functional.scaled_dot_product_attention (#110893)
Fix copy from mps to cpu device when storage_offset is non-zero (#109557)
Fix segfault in torch.sparse.mm for non-contiguous inputs (#111742)
Fix circular import between Dynamo and einops (#110575)
Verify flatbuffer module fields are initialized for mobile deserialization (#109794)

The #110961 contains all relevant pull requests related to this release as well as links to related issues.

torch 2.1.1 PyTorch 2.1.1 Release, bug fix release on Python PyPI