PyTorch 1.5.1 Release Notes

Backwards Incompatible Changes
Known Issues and Workarounds
Critical Fixes
Crashes and Error Fixes
Other Fixes

Backwards Incompatible Changes

Autograd: Operations that return integer-type tensors now always returns tensors that don’t require grad (#37789).

This most notably affects torch.argmin, torch.argmax, and torch.argsort. This change is BC-Breaking because previously one could obtain an integer-type tensor that requires grad in 1.5.0. However, said tensors were not usable by autograd; calling .backward() on them resulted in an error, so most users are likely to not have been relying on this behavior.

Version 1.5.0	Version 1.5.1
_{>>> tensor = torch.randn(3, requires_grad=True) >>> torch.argmax(tensor).requires_grad True}	_{>>> tensor = torch.randn(3, requires_grad=True) >>> torch.argmax(tensor).requires_grad False}

Known Issues and Workarounds

When using multiprocessing, PyTorch 1.5.1 and 1.5.0 may error out with complaints about incompatibility between MKL and libgomp (#37377)

You may see error messages like the following when using the torch.multiprocessing package. This bug has primarily affected users with AMD CPUs.

`Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
        Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.`

You can get rid of the error and the error message by setting the environment MKL_THREADING_LAYER=GNU. This can be done either by including the following in your python code:

import os
os.environ['MKL_THREADING_LAYER'] = 'GNU'

or by specifying the environment variable when running your script:

MKL_THREADING_LAYER=GNU python my_script.py

To learn more about what triggers this bug and other workarounds if the above isn’t working, please read this comment on the issue.

Critical Fixes

`torch.multinomial`: Fixed a bug where CUDA `multinomial` generated the same sequence over and over again with a shift of 4. (#38046)

`nn.Conv2d`: Fixed a bug where circular padding applied padding across the wrong dimension (#37881)

Version 1.5.0	Version 1.5.1
_{>>> circular = nn.Conv2d(6, 1, (3, 3), padding=(0, 1), padding_mode='circular') >>> circular(torch.zeros(1, 6, 10, 10)).shape # Notice the padding is incorrectly on the H dimension, not the W dimension. torch.Size([1, 1, 10, 8])}	_{>>> tensor = torch.randn(3, requires_grad=True) >>> other = tensor + 1 >>> output = nn.LeakyReLU(0, inplace=True)(other) >>> output.sum().backward() torch.Size([1, 1, 8, 10])}

Version 1.5.0

Version 1.5.1

_{>>> circular = nn.Conv2d(6, 1, (3, 3), padding=(0, 1), padding_mode='circular')
>>> circular(torch.zeros(1, 6, 10, 10)).shape
# Notice the padding is incorrectly on the H dimension, not the W dimension.
torch.Size([1, 1, 10, 8])}

_{>>> tensor = torch.randn(3, requires_grad=True)
>>> other = tensor + 1
>>> output = nn.LeakyReLU(0, inplace=True)(other)
>>> output.sum().backward()
torch.Size([1, 1, 8, 10])}

Fixed bug where asserts in CUDA kernels were mistakingly disabled, leading to many silent kernel errors. (#38943, #39047, #39218)

`torch.gather`, `torch.scatter`: added checks for illegal input dtypes that caused silently incorrect behaviors (#38025, #38646)

`torch.argmin`, `torch.argmax`: Fixed silently incorrect result for inputs with more than 2^32 elements (#39212)

C++ Custom Operators: fixed a bug where custom operators stopped working with autograd and ignored the `requires_grad=True` flag. (#37355)

Crashes and Error Fixes

Fixed CUDA reduction operations on inputs with more than 2^32 elements (#37788)

Version 1.5.0	Version 1.5.1
_{>>> `torch.zeros(5, 14400, 14400, device='cuda').sum(0)` `RuntimeError: sub_iter.strides(0)[0] == 0 INTERNAL ASSERT FAILED at /pytorch/aten/src/ATen/native/cuda/Reduce.cuh:706, please report a bug to PyTorch.`}	_{>>> torch.zeros(5, 14400, 14400, device='cuda').sum(0) # No problem}

Version 1.5.0

Version 1.5.1

_{>>> `torch.zeros(5, 14400, 14400, device='cuda').sum(0)`
`RuntimeError: sub_iter.strides(0)[0] == 0 INTERNAL ASSERT FAILED at /pytorch/aten/src/ATen/native/cuda/Reduce.cuh:706, please report a bug to PyTorch.`}

_{>>> torch.zeros(5, 14400, 14400, device='cuda').sum(0)
# No problem}

Fixed pickling of PyTorch operators (#38033)

Version 1.5.0	Version 1.5.1
_{>>> `pickle.dumps(torch.tanh)` PicklingError: Can't pickle : it's not the same object as torch._C._VariableFunctions}	_{>>> pickle.dumps(torch.tanh) # No problem}

`nn.LeakyReLU`: Fixed a bug where using autograd with in-place `nn.LeakyReLu` with a slope of 0 incorrectly errored out. (#37453, #37559)

Version 1.5.0	Version 1.5.1
_{>>> tensor = torch.randn(3, requires_grad=True) >>> other = tensor + 1 >>> output = nn.LeakyReLU(0, inplace=True)(other) >>> output.sum().backward() RuntimeError: In-place leakyReLu backward calculation is triggered with a non-positive slope which is not supported. This is caused by calling in-place forward function with a non-positive slope, please call out-of-place version instead.}	_{>>> tensor = torch.randn(3, requires_grad=True) >>> other = tensor + 1 >>> output = nn.LeakyReLU(0, inplace=True)(other) >>> output.sum().backward() # No error}

Version 1.5.0

Version 1.5.1

_{>>> tensor = torch.randn(3, requires_grad=True)
>>> other = tensor + 1
>>> output = nn.LeakyReLU(0, inplace=True)(other)
>>> output.sum().backward()
RuntimeError: In-place leakyReLu backward calculation is triggered with a non-positive slope which is not supported. This is caused by calling in-place forward function with a non-positive slope, please call out-of-place version instead.}

_{>>> tensor = torch.randn(3, requires_grad=True)
>>> other = tensor + 1
>>> output = nn.LeakyReLU(0, inplace=True)(other)
>>> output.sum().backward()
# No error}

`torch.as_strided` : Fixed crash when passed `sizes` and `strides` of different lengths. (#39301)

`nn.SyncBatchNorm.convert_sync_batchnorm`: Fixed bug where it did not respect the devices of the original BatchNorm module, resulting in device mismatch errors (#39344)

`nn.utils.clip_grad_norm_`: Fixed ability to operate on tensors on different devices (#38615)

`torch.min`, `torch.max`: added check for illegal output dtypes (#38850)

MacOS: Fixed `import torch` error (#36941).

C++ Extensions: fixed compilation error when building with older versions of nvcc (#37221)

This bug mainly affected users of ubuntu 16.04. We’re certain it affected the following configurations:

ubuntu 16.04 + cuda 9.2 + gcc 5
ubuntu 16.04 + cuda 9.2 + gcc 7
ubuntu 16.04 + cuda 10.0 + gcc 5

C++ Extensions: fixed ability to compile with paths that include spaces (#38860, #38670)

C++ Extensions: fixed ability to compile with relative `include_dirs` for ahead-of-time compilation (#38264)

Other Fixes

`nn.Conv1d`, `nn.Conv2d`, `nn.Conv3d`: Fixed a bug where convolutions were using more memory than previous versions of PyTorch. (#38674)

Fixed in-place floor division magic method (#38695)

In 1.5.0, the in-place floor division magic method mistakingly performed the floor division out-of-place. We’ve fixed this in 1.5.1.

Version 1.5.0	Version 1.5.1
_{>>> tensor = torch.ones(1) >>> expected_data_ptr = tensor.data_ptr() >>> tensor //= 1 >>> tensor.data_ptr() == expected_data_ptr False}	_{>>> tensor = torch.ones(1) >>> expected_data_ptr = tensor.data_ptr() >>> tensor //= 1 >>> tensor.data_ptr() == expected_data_ptr True}

Version 1.5.0

Version 1.5.1

_{>>> tensor = torch.ones(1)
>>> expected_data_ptr = tensor.data_ptr()
>>> tensor //= 1
>>> tensor.data_ptr() == expected_data_ptr
False}

_{>>> tensor = torch.ones(1)
>>> expected_data_ptr = tensor.data_ptr()
>>> tensor //= 1
>>> tensor.data_ptr() == expected_data_ptr
True}

Documentation: fixed link to java docs. (#39039)

Quantization: Fixed weight quantization inaccuracies for LSTM (#35961)

Weight quantization was done incorrectly for LSTMs, the statistics for all weights (across layers) were combined in the observer. This meant that weights for later layers in a LSTM would use sub-optimal scales impacting accuracy. The problem gets worse as the number of layers increases.

torch 1.5.1
Bug Fix release

on Python PyPI

PyTorch 1.5.1 Release Notes

Backwards Incompatible Changes

Autograd: Operations that return integer-type tensors now always returns tensors that don’t require grad (#37789).

Known Issues and Workarounds

When using multiprocessing, PyTorch 1.5.1 and 1.5.0 may error out with complaints about incompatibility between MKL and libgomp (#37377)

Critical Fixes

`torch.multinomial`: Fixed a bug where CUDA `multinomial` generated the same sequence over and over again with a shift of 4. (#38046)

`nn.Conv2d`: Fixed a bug where circular padding applied padding across the wrong dimension (#37881)

Fixed bug where asserts in CUDA kernels were mistakingly disabled, leading to many silent kernel errors. (#38943, #39047, #39218)

`torch.gather`, `torch.scatter`: added checks for illegal input dtypes that caused silently incorrect behaviors (#38025, #38646)

`torch.argmin`, `torch.argmax`: Fixed silently incorrect result for inputs with more than 2^32 elements (#39212)

C++ Custom Operators: fixed a bug where custom operators stopped working with autograd and ignored the `requires_grad=True` flag. (#37355)

Crashes and Error Fixes

Fixed CUDA reduction operations on inputs with more than 2^32 elements (#37788)

Fixed pickling of PyTorch operators (#38033)

`nn.LeakyReLU`: Fixed a bug where using autograd with in-place `nn.LeakyReLu` with a slope of 0 incorrectly errored out. (#37453, #37559)

`torch.as_strided` : Fixed crash when passed `sizes` and `strides` of different lengths. (#39301)

`nn.SyncBatchNorm.convert_sync_batchnorm`: Fixed bug where it did not respect the devices of the original BatchNorm module, resulting in device mismatch errors (#39344)

`nn.utils.clip_grad_norm_`: Fixed ability to operate on tensors on different devices (#38615)

`torch.min`, `torch.max`: added check for illegal output dtypes (#38850)

MacOS: Fixed `import torch` error (#36941).

C++ Extensions: fixed compilation error when building with older versions of nvcc (#37221)

C++ Extensions: fixed ability to compile with paths that include spaces (#38860, #38670)

C++ Extensions: fixed ability to compile with relative `include_dirs` for ahead-of-time compilation (#38264)

Other Fixes

`nn.Conv1d`, `nn.Conv2d`, `nn.Conv3d`: Fixed a bug where convolutions were using more memory than previous versions of PyTorch. (#38674)

Fixed in-place floor division magic method (#38695)

Documentation: fixed link to java docs. (#39039)

Quantization: Fixed weight quantization inaccuracies for LSTM (#35961)

DistributedDataParallel: Fixed single-process multi-GPU use case (#36503)

RPC: Fixed future callbacks not capturing and restoring autograd context id (#38512)

TorchScript: Fixed support with `torch.unique` (#38156)

ONNX: Fix `pow` operator export (#39791)

torch 1.5.1 Bug Fix release on Python PyPI

PyTorch 1.5.1 Release Notes

Backwards Incompatible Changes

Autograd: Operations that return integer-type tensors now always returns tensors that don’t require grad (#37789).

Known Issues and Workarounds

When using multiprocessing, PyTorch 1.5.1 and 1.5.0 may error out with complaints about incompatibility between MKL and libgomp (#37377)

Critical Fixes

torch.multinomial: Fixed a bug where CUDA multinomial generated the same sequence over and over again with a shift of 4. (#38046)

nn.Conv2d: Fixed a bug where circular padding applied padding across the wrong dimension (#37881)

Fixed bug where asserts in CUDA kernels were mistakingly disabled, leading to many silent kernel errors. (#38943, #39047, #39218)

torch.gather, torch.scatter: added checks for illegal input dtypes that caused silently incorrect behaviors (#38025, #38646)

torch.argmin, torch.argmax: Fixed silently incorrect result for inputs with more than 2^32 elements (#39212)

C++ Custom Operators: fixed a bug where custom operators stopped working with autograd and ignored the requires_grad=True flag. (#37355)

Crashes and Error Fixes

Fixed CUDA reduction operations on inputs with more than 2^32 elements (#37788)

Fixed pickling of PyTorch operators (#38033)

nn.LeakyReLU: Fixed a bug where using autograd with in-place nn.LeakyReLu with a slope of 0 incorrectly errored out. (#37453, #37559)

torch.as_strided : Fixed crash when passed sizes and strides of different lengths. (#39301)

nn.SyncBatchNorm.convert_sync_batchnorm: Fixed bug where it did not respect the devices of the original BatchNorm module, resulting in device mismatch errors (#39344)

nn.utils.clip_grad_norm_: Fixed ability to operate on tensors on different devices (#38615)

torch.min, torch.max: added check for illegal output dtypes (#38850)

MacOS: Fixed import torch error (#36941).

C++ Extensions: fixed compilation error when building with older versions of nvcc (#37221)

C++ Extensions: fixed ability to compile with paths that include spaces (#38860, #38670)

C++ Extensions: fixed ability to compile with relative include_dirs for ahead-of-time compilation (#38264)

Other Fixes

nn.Conv1d, nn.Conv2d, nn.Conv3d: Fixed a bug where convolutions were using more memory than previous versions of PyTorch. (#38674)

Fixed in-place floor division magic method (#38695)

Documentation: fixed link to java docs. (#39039)

Quantization: Fixed weight quantization inaccuracies for LSTM (#35961)

DistributedDataParallel: Fixed single-process multi-GPU use case (#36503)

RPC: Fixed future callbacks not capturing and restoring autograd context id (#38512)

TorchScript: Fixed support with torch.unique (#38156)

ONNX: Fix pow operator export (#39791)

torch 1.5.1
Bug Fix release

on Python PyPI

`torch.multinomial`: Fixed a bug where CUDA `multinomial` generated the same sequence over and over again with a shift of 4. (#38046)

`nn.Conv2d`: Fixed a bug where circular padding applied padding across the wrong dimension (#37881)

`torch.gather`, `torch.scatter`: added checks for illegal input dtypes that caused silently incorrect behaviors (#38025, #38646)

`torch.argmin`, `torch.argmax`: Fixed silently incorrect result for inputs with more than 2^32 elements (#39212)

C++ Custom Operators: fixed a bug where custom operators stopped working with autograd and ignored the `requires_grad=True` flag. (#37355)

`nn.LeakyReLU`: Fixed a bug where using autograd with in-place `nn.LeakyReLu` with a slope of 0 incorrectly errored out. (#37453, #37559)

`torch.as_strided` : Fixed crash when passed `sizes` and `strides` of different lengths. (#39301)

`nn.SyncBatchNorm.convert_sync_batchnorm`: Fixed bug where it did not respect the devices of the original BatchNorm module, resulting in device mismatch errors (#39344)

`nn.utils.clip_grad_norm_`: Fixed ability to operate on tensors on different devices (#38615)

`torch.min`, `torch.max`: added check for illegal output dtypes (#38850)

MacOS: Fixed `import torch` error (#36941).

C++ Extensions: fixed ability to compile with relative `include_dirs` for ahead-of-time compilation (#38264)

`nn.Conv1d`, `nn.Conv2d`, `nn.Conv3d`: Fixed a bug where convolutions were using more memory than previous versions of PyTorch. (#38674)

TorchScript: Fixed support with `torch.unique` (#38156)

ONNX: Fix `pow` operator export (#39791)