Note: our conda install commands have slightly changed. Version specifiers such as cuda100
in conda install pytorch cuda100 -c pytorch
have changed to conda install pytorch cudatoolkit=10.0 -c pytorch
Breaking Changes
There are no breaking changes in this release.
Bug Fixes
Serious
- Higher order gradients for CPU Convolutions have been fixed (regressed in 1.0.0 under MKL-DNN setting) #15686
- Correct gradients for non-contiguous weights in CPU Convolutions #16301
- Fix ReLU on CPU Integer Tensors by fixing vec256 inversions #15634
- Fix bincount for non-contiguous Tensors #15109
- Fix torch.norm on CPU for large Tensors #15602
- Fix eq_ to do equality on GPU (was doing greater-equal due to a typo) (#15475)
- Workaround a CuDNN bug that gave wrong results in certain strided convolution gradient setups
- blacklist fft algorithms for strided dgrad (#16626)
Correctness
- Fix cuda native loss_ctc for varying input length (#15798)
- this avoids NaNs in variable length settings
- C++ Frontend: Fix serialization (#15033)
- Fixes a bug where (de-)/serializing a hierarchy of submodules where one submodule doesn't have any parameters, but its submodules do
- Fix derivative for mvlgamma (#15049)
- Fix numerical stability in log_prob for Gumbel distribution (#15878)
- multinomial: fix detection and drawing of zero probability events (#16075)
Crashes
- PyTorch binaries were crashing on AWS Lambda and a few other niche systems, stemming from CPUInfo handling certain warnings as errors. Updated CPUInfo with relevant fixes.
- MKL-DNN is now statically built, to avoid conflicts with system versions
- Allow ReadyQueue to handle empty tasks (#15791)
- Fixes a segfault with a DataParallel + Checkpoint neural network setting
- Avoid integer divide by zero error in index_put_ (#14984)
- Fix for model inference crash on Win10 (#15919) (#16092)
- Use CUDAGuard when serializing Tensors:
- Before this change,
torch.save
andtorch.load
would initialize the CUDA context on GPU 0 if it hadn't been initialized already, even if the serialized tensors are only on GPU 1.
- Before this change,
- Fix error with handling scalars and rpow, for example
1 ^^ x
, where x is a PyTorch scalar (#16687) - Switch to CUDA implementation instead of CuDNN if batch size >= 65536 for affine_grid (#16403)
- CuDNN crashes when batch size >= 65536
- [Distributed] TCP init method race condition fix (#15684)
- [Distributed] Fix a memory leak in Gloo's CPU backend
- [C++ Frontend] Fix LBFGS issue around using inplace ops (#16167)
- [Hub] Fix github branch prefix v (#15552)
- [Hub] url download bugfix for URLs served without Content-Length header
Performance
- LibTorch binaries now ship with CuDNN enabled. Without this change, many folks saw significant perf differences while using LibTorch vs PyTorch, this should be fixed now. #14976
- Make btriunpack work for high dimensional batches and faster than before (#15286)
- improve performance of unique with inverse indices (#16145)
- Re-enable OpenMP in binaries (got disabled because of a CMake refactor)
Other
- create type hint stub files for module torch (#16089)
- This will restore auto-complete functionality in PyCharm, VSCode etc.
- Fix sum_to behavior with zero dimensions (#15796)
- Match NumPy by considering NaNs to be larger than any number when sorting (#15886)
- Fixes various error message / settings in dynamic weight GRU / LSTMs (#15766)
- C++ Frontend: Make call operator on module holder call forward (#15831)
- C++ Frontend: Add the normalize transform to the core library (#15891)
- Fix bug in torch::load and unpack torch::optim::detail namespace (#15926)
- Implements Batched upper triangular, lower triangular (#15257)
- Add torch.roll to documentation (#14880)
- (better errors) Add backend checks for batch norm (#15955)
JIT
- Add better support for bools in the graph fuser (#15057)
- Allow tracing with fork/wait (#15184)
- improve script/no script save error (#15321)
- Add self to Python printer reserved words (#15318)
- Better error when torch.load-ing a JIT model (#15578)
- fix select after chunk op (#15672)
- Add script standard library documentation + cleanup (#14912)