This release is meant to fix the following issues (regressions / silent correctness):
Breaking Changes:
- The pytorch/pytorch docker image now installs the PyTorch package through pip and has switch its conda installation from miniconda to miniforge (#134274)
Windows:
- Fix performance regression on Windows related to MKL static linking (#130619) (#130697)
- Fix error during loading on Windows: [WinError 126] The specified module could not be found. (#131662) (#130697)
MPS:
- Fix tensor.clamp produces wrong values (#130226)
- Fix Incorrect result from batch norm with sliced inputs (#133610)
ROCM:
- Fix for launching kernel invalid config error when calling embedding with large index (#130994)
- Added a check and a warning when attempting to use hipBLASLt on an unsupported architecture (#128753)
- Fix image corruption with Memory Efficient Attention when running HuggingFace Diffusers Stable Diffusion 3 pipeline (#133331)
Distributed:
- Fix FutureWarning when using torch.load internally (#130663)
- Fix FutureWarning when using torch.cuda.amp.autocast internally (#130660)
Torch.compile:
- Fix exception with torch compile when onnxruntime-training and deepspeed packages are installed. (#131194)
- Fix silent incorrectness with torch.library.custom_op with mutable inputs and torch.compile (#133452)
- Fix SIMD detection on Linux ARM (#129075)
- Do not use C++20 features in cpu_inducotr code (#130816)
Packaging:
- Fix for exposing statically linked libstdc++ CXX11 ABI symbols (#134494)
- Fix error while building pytorch from source due to not missing QNNPACK module (#131864)
- Make PyTorch buildable from source on PowerPC (#129736)
- Fix XPU extension building (#132847)
Other:
- Fix warning when using pickle on a nn.Module that contains tensor attributes (#130246)
- Fix NaNs return in MultiheadAttention when need_weights=False (#130014)
- Fix nested tensor MHA produces incorrect results (#130196)
- Fix error when using torch.utils.flop_counter.FlopCounterMode (#134467)
Tracked Regressions:
- The experimental remote caching feature for Inductor's autotuner (enabled via TORCHINDUCTOR_AUTOTUNE_REMOTE_CACHE) is known to still be broken in this release and actively worked on in main. Following Error is generated: redis.exceptions.DataError: Invalid input of type: 'dict'. Please use nightlies if you need this feature (reported and Fixed by PR: #134032)
Release tracker #132400 contains all relevant pull requests related to this release as well as links to related issues.